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Abstract 


Many games are naturally described as a sum of games, e.g., nim and 
the endgame of Go. Let Gj,...,G, represent n games. Then a move in the 
sum G,+---+G, consists of picking a component game G; and making 
a move in G;. This thesis analyzes play in a sum of games from three 
different perspectives: computational complexity, approximate solutions, 
and optimal search algorithms. 

Lockwood Morris[17] proves that the problem of determining the opti- 
mal strategy in a sum of games is PS PAC E-complete. This thesis proves 
that the problem is PS PACE-complete even when the component games 
are so simple that they can be represented as depth two trees. 

Hanner [7] shows that the value of a sum of games can approximated 
to within the maximum temperature of the component games. This thesis 
presents a clear and concise proof of Hanner’s bounds. This thesis also 
improves upon Hanner’s result. It shows that the value of a sum of games 
can be approximated to within the second highest temperature. 

This thesis describes how Berliner’s B* search algorithm[4] can be ef- 
fectively combined with the approximate solutions to speed up the search 
for an optimal solution. 
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Chapter 1 


Introduction 


Let G,,...,G, represent n games. The sum, 
G, se Gn, 


is a game in which a move consists of picking a component game G; and 
making a move in G;. The sum is over when a player is unable to make a 
move in any of the component games. 

Many games are naturally described as a sum of games. Nim is the 
sum of a number of simple heap games. A hackenbush game is a sum of a 
number of distinct hackenbush pictures. 

Other common examples occur when, during the course of a game, a 
single integrated game becomes decomposed into a sum of games. For ex- 
ample, when the game dots and bozes is played, the board quickly becomes 
divided up into a number of isolated dots and bores games. Figure 1.1 
shows a dots and bores game that is the sum of three distinct games. To 
move in the sum, a player decides which area to play in, and then draws a 
line in that area. 

The game of Go is another example. Towards the end of the game, 
the black and white stones divide the board into separate territories. The 
endgame of Go revolves around a number of small, independent, border 
disputes and is naturally characterized as a sum of games. To move in the 
sum, a player decides with area to play in, and then places a stone in that 
area. 

The endgame of Go is an important example because of the wide spread 
interest in the game. Go is played professionally in the Orient at the level 


Figure 1.1: A Dots and Bores Game as a Sum of Games 


that chess is played in the West. Furthermore, Go has attracted a reason- 
able amount of attention from the A.I. community as being a good domain 
for studying problem solving issues, e.g., [1], [13], [23], and [24]. In fact, 
the endgame of Go was the author’s original motivation for studying sums 
of games. 

This thesis analyzes the play in a sum of games from three different per- 
spectives: computational complexity, approximate solutions, and optimal 
search algorithms. 

Call SUM the problem of determining the winning strategy in a sum of 
games. The goal, in analyzing the computational complexity of SUM, is to 
determine the theoretical bound on how efficient any algorithm for solving 
SUM or a restricted subclass of SUM can be. Lockwood Morris[17] proves 
that SUM is PS PAC E-complete. Hence, assuming that P # PSPACE, 
no polynomial time, optimal strategy exists for playing an arbitrary sum of 
games. However, Morris’s proof leaves open the possibility that there exist 
interesting restricted subclasses of SUM that can be solved in polynomial 
time. 

This thesis narrows the gap between those problems known to be in P 
and those that are known to be PSPACE-complete. It proves that SUM 
is PS PAC E-complete even when the component games are so simple that 
they can be represented as depth two trees. 

Since SUM is PSPACE-complete, it is unlikely that an efficient algo- 
rithm for solving it will be discovered. Two alternate approaches exist for 
coping with an instance of SUM. The first is to relax the criteria of success. 


Instead of requiring an optimal solution, an algorithm is only required to 
produce a solution that is close to optimal. The second alternative is to do 
an exponential time search for the optimal solution. 

Approximate solutions are beneficial when speed is critical and small 
errors in the solution are tolerable. Hanner [7] proves that the value of 
a sum of games can approximated to within the maximum temperature of 
the component games. These bounds are surprisingly accurate. They are 
independent of the number of component games in the sum and independent 
of the complexity of the component games. This thesis presents a clear and 
concise proof of Hanner’s bounds. 

This thesis also improves upon Hanner’s result. It show that the value 
of a sum of games can be approximated to within the second highest tem- 
perature. 

When the optimal solution to an instance of SUM is required, the only 
currently available technique is exponential time search. However, the ap- 
proximate solution described in this thesis provide a great deal of power 
in directing and pruning search. In particular, this thesis describes how 
Berliner’s B* search algorithm|4] can be effectively combined with the ap- 
proximate solutions to speed up the search for an optimal solution. 

The remainder of this thesis is organized as follows: Chapter 2 gives the 
basic definitions used through out the thesis. This includes the definition 
of a game, a sum of games, and taxation. Chapter 3 analyzes optimal play 
for some simple restricted classes of SUM. Chapter 4 proves that SUM is 
PSPACE-complete even when the component games have depth two or 
less. Chapter 5 considers the two basic approaches for coping with a sum 
of games: approximate solutions and optimal search algorithms. Chapter 6 
considers how the theory of sums of games as described in this thesis applies 
to the endgame of Go. Chapter 7 summarizes the results and considers 
future research directions. 


Chapter 2 


Basics 


Conway developed a unified theory of numbers and games. The theory is 
beautiful, elegant, and fun. The reader is encouraged to read [5], [11], and 
[2] for an extensive introduction. [5] provides a presentation of the whole 
theory. [11] presents a simple introduction to the development of numbers. 
[2] presents an extensive treatment of the theory with respect to games. 

This chapter presents the portion of the theory used in this thesis. In 
particular the notion of game, sums of games, and taxation is described. 
However, the chapter does not go into the development of numbers, but 
assumes that numbers, such as 5 and —3i, are well defined. 


2.1 Games 


Consider a two person, zero sum, perfect information game like chess, check- 
ers, sprouts, or nim. Let the two players be called Left and Right. It is 
natural to represent such a game as a tree, where 

e the nodes represent positions, 

e the root is the initial position, 


e an edge between node a and 6 represents a valid move from 
position a to position b, 


e leaf nodes represent stopping positions, and 


e the value of a leaf represents the payoff to Left when the 
game reaches that final position. 


Figure 2.1: Defining Hy 


By definition, Left wins the game if the final score (his payoff) is greater 
than zero, or if the the final score equals zero and its Right’s turn to play 
when the game ends. 

Moves for Left are distinguished from moves for Right by the direction 
of the edges. In particular, moves for Left are represented by edges that 
go down and to the left. Moves for Right are represented by edges that go 
down and to the right. 

For example, consider the game shown in Figure 2.1. If Right plays 
first, then his only move his to —7. He collects 7 from Left. ‘If Left plays 
first, then Left has two options. He can either move to 20, thus ending the 
game and collecting 20 from Right, or he can move to: 


as 


100 15 


and when Right responds, Left collects 15. Since Left prefers gaining 20 
over gaining 15, Left’s optimal move is to 20. 

This representation of a game is slightly unconventional in that at every 
position moves for both Left and Right are represented. Representing moves 
for both players at every position is necessary to handle sums of games. 

The following notation is used to specify a game in a more textual way 
than as a picture of a tree: if Z,,...,L, and Ri,...,R, are games then 


{ Ly,-++,Ln | Ri,-++, Rn} 


CA 
100 10-20 
25 20 
Vi(H>) = 25 ” Vp(He) = -15 
Li(H) = {100]{25|20}}  Ry(Hz) = -15 
(Hz) = { 25 | 20} 
Ls(H2) = 25 


Figure 2.2: Vr(H2), Vr(H2), Ln(H2) and R,(H2) 


is a game where Left has the option of moving to one of the L; games, and 
Right has the option of moving to one of the R; games. For example, in 
this notation, the game shown in Figure 2.1 becomes: 


{ 20, { 100 | 15} | —7} 


The following notation is used to specify the position a game reaches 
after n moves and the value of a game: 


L,,(G) > The position that results after n moves when both 
players play alternately, both players play optimally, and 
Left starts. 


R,(G) > has the same meaning as L,(G) except that Right 
starts the play in G. 


Vi(G) => is the final score when both players play alternately, 
both players play optimally, and Left starts. 


Vr(G) => is the same as V;(G) except that Right starts the play 
in G. 


For example, Figure 2.2 shows the notation applied to H2. 


This thesis only considers games for which both players want to play, 
t.e., a player gains more by playing in the game than by allowing his op- 
ponent to play. More precisely, for all games G and for each position G’ in 
G: 

V,(G') > Vr(G’). 


When the condition does not hold, the game is called a number. Again the 
reader is referred to [5] and [11] for a more detailed treatment of numbers. 


2.2 Sum of Games 


Let G,,...,G, represent n games. Then a move in the sum 
G=G,+---+G, 


consists of picking a component game G; and making a move in G;. G 
terminates when each component game, G;, is reduced to a number. The 
final score is the sum of the final positions. 

For example, Figure 2.3 shows optimal play in the sum of two games. 
Left begins by playing in the second component game and thus preventing 
Right from gaining —80. Right is not obliged to respond in the same 
game that Left played in. In fact, Right’s optimal move is to play in the 
first component game. Finally, on the third move, Left plays in the only 
remaining game. The final score is positive, so Left wins. 

As with a single game, Left’s optimal strategy in a sum of games is to 
maximize the final score. However, unlike a single game where the optimal 
strategy can be determine efficiently by a min-max procedure, there appears 
to be no fast algorithm for determining the optimal strategy in a sum of 
games. 


2.3. Sente 


Sente is a Japanese word often used to describe a move in a Go game. A 
move in a sum of games is sente if it forces the opponent to respond locally, 
t.e., in the same component game. 

The notion of sente is important in understanding optimal play in asum 
of games. For example, consider H3, and H, shown in Figure 2.4. Left’s 


Left’s optimal move 


Figure 2.3: Optimal Play in a Sum of Two Games 


Figure 2.4: Defining H3 and H, 


optimal move in game Hs is to collect 15. Left’s optimal move in H, is to 
collect —10. However, if Left plays either of these locally optimal moves in 
Hz + H4, then Right will respond in the other component game and win. 
Left’s optimal move in Hs + Hy is to { 100 | 12}. This forces Right to 
respond locally and take 12. Left is then able to play first in the second 
component game. The final score is 12 — 10 = 2 and Left wins. 

Left’s move to { 100 | 12}, though not locally optimal, forces a response 
from Right. It is a sente move. Using chess terminology, it allows Left to 
keep tempo. 


2.4 The Negative of a Game 


The negative of a game is obtained by reversing the roles of Left and Right. 
In particular, if G = { L | R}, then the negative of G is defined as follows: 


-@={-R|-L} 
For example: 


~{5| 4} {-4|—5} 
~{{10]5}|—-4} {4|{-5 | —10}} 


For any game G, the first player to move in 


G+ (-G) 


Il 


II 


will always lose. Without loss of generality, assume Left starts. For every 
move Left can make, Right can respond in the other component game with 


H, = He= /\, HS 
100 -100 


Mean(Hs) = 20 Mean(He)=0 $Mean(H7) = —15 


Figure 2.5: Defining Hs, He, and H7 


the negative of Left’s move. Hence, Right can cancel any gain Left makes 
by going first. The final score will be zero with Left to play, but he has no 
play, so he loses. 


2.5 Mean Value 


The mean value of a game was first defined by Hanner[7] and used to 
approximate the value of a sum of games. The mean value of a game 
measures the inherent worth of a game not counting the advantage a player 
gains by moving first. 

Let nG represent the sum of n copies of G, t.e., 


n 


eis 
Gtr +G. 


The mean value of G is defined to be 


Mean(G) = lim Yi(nG) 
m— oo mr 
The limit always exists [7]. 

One way to compute the mean value of G is to analyze optimal play in 
nG for large n. This is best described through examples. What follows is 
a description of how the mean value for each of the three games shown in 
Figure 2.5 is computed. 

Consider optimal play in the game nH;. Every time Left plays, Left 
reduces a component game to { 100 | 20}. In doing so, Left threatens to 
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take 100. This forces Right to respond and reduce the switch to 20. Hence, 
each component game of nH; is reduced to 20, V;(nH5) = 20n, and 


* 20 
pee 90 


Mean(H;) = lim) 


Next, consider optimal play in the game nHg where n is even. Since the 
mean value is defined as a limit as n goes to infinity, no generality is lost 
by assuming that n is even. Every play by Left in nH, is balanced by a 
play by Right, t.e., every time Left plays in a switch and takes 100, Right 
responds in another switch and takes —100. Hence, V,(nH¢) = 0 and 


Mean(He) = lim v= 0 
n-+0o nm 
Finally, consider play in the game nH7. Without loss of generality, 
assume that n is a multiple of 4. Play is divided into two stages. During 
the first stage, Left’s optimal move is always to reduce one of the H7 games 
to {20 | 0} and Right’s optimal response is to reduce one of the H7 games 
to —40. At the end of the first stage, the game has been reduced to: 


& 
2 


{2070} +--+ {20/0} +. 


During the second stage, Left and Right will play out the } switches. The 
final result is that half of the H; games are reduced to —40, a quarter of 
them are reduced to 20 and a quarter of them are reduced to 0. Therefore: 


-40n | 20n | On 
_ =40n 4 204 On 6 
Mean(H7) = jim cae ar es 


= -15 
Determining the mean value of a game G by analyzing play in nG is 


cumbersome. Section 2.8 provides an efficient method for computing the 
mean value of a game. 


2.6 Taxation 


It is useful to consider the value of a game when a tax of ¢ is imposed on 
every move in the game. When a game is taxed and its Left’s turn to play, 
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Figure 2.6: Defining Hg 


Left either pays Right t and makes a move, or passes. Similarly, on Right’s 
turn, Right either pays Left t and makes a move, or Right passes. 

For example, consider play in Hg (Figure 2.6) when the tax is 3. If 
Left pays 3 to take 12, then he gains 9. However, if Left pays 3 to move 
to { 20 | 10}, then when Right responds, Right pays 3 to move to 10. 
The taxes Left paid to Right and the taxes Right paid to Left balance out. 
Left gets a net gain of 10. Hence, when ¢ = 3, Left’s optimal move is to 
{ 20 | 10}. 

Though it is not immediately apparent, taxation is a powerful concept. 
It is at the heart of some good heuristics for playing in a sum of games 
described in Chapter 5. 

This section defines the temperature of a game, and then specifies a 
simple equation for the value of a game when a tax of t is imposed on every 
move. 


2.6.1 Temperature 


The temperature of a game, o(G), is the maximum tax that can be imposed 
and still have Left and Right willing to play first in G. For example, consider 
the game Hy (Figure 2.7) and consider how the players react when a tax of 
t is imposed on every move for different values of t: 


t < 100: Both players want to play first in Hy. That is, a player 
prefers paying t to his opponent and gaining 100 than hav- 
ing his opponent pay him ¢ but losing 100. 
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He=/\  Hw=/\, 


100 -100 5 -5 


Mean(Hy) = 0 Mean(Hio) = 0 
o( Ho) = 100 o( Ho) =5 


Figure 2.7: Defining Hy and Ho 


t = 100: Both players are indifferent; it does not matter who 
plays first. 


t > 100: Neither player will want to play first. 


Since the players are willing to pay a tax of t < 100 in order to play first 
in Hg, o( Hy) = 100. 

Intuitively, temperature measures how excited a player is to play in a 
game G. By quantifying how much a player is willing to pay in order to 
move in G, it provides some measure for the worth of a move in G. For 
example, consider the two games in Figure 2.7. If the two games appear in 
a sum of games, then both players will be very anxious to play first in Hy 
but will be relatively indifferent as to who plays first in Hip. Temperature 
captures this idea, since o(Hg) >> o( Ao). 

Temperature measures the value of a move in a game. Mean value 
measures the inherent worth of the game having explicitly factored out the 
advantage a player gains for moving first in a game. Together, they provide 
a good first order approximation to a game. 


2.6.2 The Value of a Taxed Game 


The value of a game G for a player can be defined recursively as what his 
opponent gains after he has made one move, t.e., if G is not a number then: 


Vi(G) = Va(Li(G)) (2.1) 
Vr(G) = Vi(Ri(G)) (2.2) 
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This subsection formulates similar equations for the value of a game 
when a tax of t is imposed. The following notation is used: 


‘L,(G) > The position that results after n moves when a tax 
of t is imposed on each move, players alternate moves, and 
both players play optimally. 

*R,(G) > has the same meaning as ‘L,(G) except that Right 
starts the play in G. 


‘V,(G) => is the final score when a tax of t is imposed on each 
move, players alternate turns, Left starts, and both players 
play optimally. 


'Vr(G) => is the same as ‘V,(G) except that Right starts the 
play in G. 


If ¢ > o(G) then neither player is willing to move in G. In that case 
it is convenient to define ‘V,(G) and ‘Vg(G) to be the same value as when 
t = o(G). If t < o(G), then the value for Left is the value for Right in ‘L1G 
minus the tax t that Left paid to Right. Thus: 


t s ‘Va(‘LiG)—t if t <o(G) 
MG) (9), (G) if t > o(G) 29) 
where o(G) is the minimum ¢ such that ‘V,(G) = ‘Ve(G). The equation 
for ‘Vp is the dual ! of the above equation. For example, Figure 2.8 shows 
the values of ‘V;(H,,) and ‘Vp (H11) for different values of ¢. 

Once the tax is set, it remains constant through out play in a game. 
However, not all moves in a game are worth the same amount. Hence, 
it is common for the players to be willing to pay the tax only for some 
portion of a game. For example, consider the game Hj, shown in Figure 
2.8. Assume that Left pays ¢ and moves to {5 | —5 }. If ¢t < 5, then Right 
will respond to Left’s move. The taxes balance out, and the final score is 
—5. However, if 5 < ¢ < 10 then Right will not respond to Left’s move. 
Since ‘Vp({ 5 | —5 }) =0, the final score is 0 —t. 


1Since Left is trying to maximize and Right is trying to minimize, an equation for Left and 
can be converted into an equation for Right by taking its dual. The dual of an equation 
is obtained by transforming all < to >, < to >, + to —, and — to +. Furthermore, all 
Rs become Ls and all Ls become Rs in the notation ‘V,,‘Vp,'L, and ‘R,. 
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Figure 2.8: Computing ‘V, and 'Vp for Hy; and Hy: 


2.7 Thermographs 


Thermographs provide a simple, efficient method for computing o(G), and 
for computing ‘V,(G) and ‘Vp(G) for all values of t. A thermograph is a 
plot of ‘Vz and ‘Vp for increasing values of t where the tax is on the y axis, 
and the value of the game is on the x axis. In order to keep ‘Vz to the left 
of ‘Vp, the values on the x axis are in decreasing order. For example, the 
thermograph for { 5 | —5 } is shown in Figure 2.9. 

Since taxing a number has no effect, the thermograph for a number is 
a vertical line. The thermograph for G = { L | R} is obtained recursively 
from the thermograph of L and R by applying Equation 2.3 and its dual. 
The thermograph for G has three parts: its left edge, its right edge, and 
its mast. What follows is a description of how each part is obtained. 

By Equation 2.3, when t < o(G): 


'V,(G) = 'Vp(L) —t. 


Hence, the left edge of G’s thermograph is obtained by subtracting t from 
every point on the right edge of L’s thermograph. Graphically, subtracting 
increasing values of ¢ corresponds to changing lines of slope one ( \_) into 
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Figure 2.9: Thermograph for { 5 | —5 } 


vertical lines, and changing vertical lines into lines with slope minus one 


(7). 
Similarly, by the dual of Equation 2.3 when t < o(G), then 


‘Ve (G) = ‘Vi (R) +t. 


Hence, the right edge of the thermograph for G is obtained by adding t 
to every point on the left edge of R’s thermograph. Graphically, adding 
increasing values of t corresponds to changing lines of slope minus one 
(7) into vertical lines, and changing vertical lines into lines with slope 
one (\). 

The plot of ‘V,(G) and ‘Vg(G) intersect at o(G). Since for all t > o(G): 


'Vi(G) = ‘Va(G) = °V,(G) 


each thermograph is surmounted by an infinite vertical mast that starts at 
t = o(G). 

To construct the thermograph for a game, start at the bottom of the 
tree and work up. For example, consider the game H;3 shown in Figure 
2.10. The thermograph for R,(His) is obtained from the thermographs 
for {5 | —5 } and —20 as shown in Figure 2.11. The thermograph for Hi; 
is obtained from the thermographs for R,(His) and Li(H13) as shown in 
Figure 2.12. 
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Figure 2.10: Defining H3 
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Figure 2.11: Constructing the Thermograph for {{ 5 | —5 } | 20 } 


tee —--4--+-—---b-----4-----+- 
t=10---4-~---Yp--AX----A------- 
t=5-—-—— -~--- ee cae 


{{22|10}| 
{{5|-5}] —20}} 


20 15 10 5 0 -5 -10 -15 -20 —25 


Figure 2.12: Constructing the Thermograph for Hs 
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The line representing 'Vi(G) in a thermograph either goes vertically up 
or goes up and to the right (slope -1). The line representing ‘Vp(G) either 
goes vertical up or goes up and to the left (slope 1). Furthermore, since 
the lines meet when t = o(G), it is easy to show the following bounds on 
(4)V,(G) 

'V,(G) > “VV, (G) > ‘Ve(G) (2.4) 


'VL(G) < “OV, (G) <'VL(G) +t (2.5) 


2.8 Mean Value Revisited 


One surprising fact is that the value of a game when a tax of t > o(G) is 
imposed equals the mean value of a game, t.e., 


Vt > o(G) : Mean(G) = ‘V,(G) = ‘Vr(G). (2.6) 


This result is proved by Hanner [7]. 

[2] contains a short argument for why Equation 2.6 is true. Unfortu- 
nately, the argument is unconvincing. The heart of the argument is that 
taxation is a linear function. However, the argument assumes that if in the 
game A+ Ba tax t:o(A) <t <o(B) is imposed on every move then play 
in the game is equivalent to play in the game: 


(Aly, (A) + B. 


However, this assumes the result. If taxation is not linear, then one has 
now way of knowing what play in A+ B is like when a tax is imposed. 

Conway(|6] provides a more convincing argument. However, special care 
is still need to handle the case when t is close to o( A). 

Since the value of a game when a tax of t > o(G) is imposed equals the 
mean value of a game, thermographs provide an efficient way of computing 
mean values. In particular, to compute the mean value of a game G, simply 
draw the thermograph for G and read off the graph ‘Vz for t > o(G). For 
example, the thermograph for H,3 (Figure 2.12) shows that mean(Hs) = 
o(is)¥) (His) = 2.5, and the thermograph for 'Ri(H13) (Figure 2.11) shows 
that mean(‘Ri(His)) = 10. 
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Chapter 3 


Easy Games - Hard Games 


In general, SUM is PSPAC E-complete. However, there exists restricted 
classes of SUM that have polynomial time solutions. This chapter analyzes 
a few restricted classes and shows that some have polynomial time solutions 
while others seem hard to play. 


3.1 Switches 


A game { z| y} where z, y are numbers and z > y is called a switch. A 
sum of switches, though not hard to play, is an important special case that 
is used repeatedly through out this thesis. 

Any switch { z | y } can be unbiased by converting it into 


ut+{v|—v} 


where u equals 1/2(z + y) and v, the temperature of the switch, equals 
1/2(z — y). For example, 


{ 20 | -12} => 4+ {16 | -16}. 


Left does not care if the game consists of {20 | —12} or if the game consists 
of { 16 | —16 } but he is given 4 additional points at the start of the game. 

In general, it is convenient to unbias a switch. Though the value of u 
affects the value of the final score, it does not affect optimal play and can 
be ignored in all discussions of strategy. 
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Figure 3.1: Stacked Coins 


Consider the following game composed of n unbiased switches: 
{ay | ay }+-+-+ {25 |—20} 


where z; >... > 2. [2] makes a good analogy between this game and a 
game where a bunch of coins (z;,...,2,) are placed on a table. On each 
move a player can take a coin off the table and place it in his pocket. The 
final score represents the difference between the amount of money in Left’s 
pocket and the amount in Right’s pocket. ; 

The optimal strategy for both players is to choose the largest switch 
(coin). If Left plays first, then the value of the game is equal to the alter- 
nating sum: 

ALT = 2, -—Zogt-+::+2y. 


If Right plays first, then the value of the game is the negative of ALT. 


3.2 Stacks of Coins 


One simple generalization of the above game is to imagine several stacks of 
coins placed on a table as in Figure 3.1. On each turn a player can remove 
one of the exposed coins and place that coin in his pocket. The score is the 
difference between the amount of money in Left’s pocket and the amount 
in Right’s pocket. 

The tree representation of a stack of two coins, z and y where z is on y, 
is shown in Figure 3.2. If z < y, then the game is a number. For example, 
in the game shown in Figure 3.1, Left is unwilling to take the 25 coin since 
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Figure 3.2: Representing a Stack of Coins 


Right would then be able to take the 30 coin. From Left’s point of view 
this is a net loss of 5. 

Like the previous game, this game is easy to play. The optimal strategy 
is simply the greedy strategy, t.e., take the largest coin on a stack which is 
not a number. For example, in the game shown in Figure 3.1 Left’s optimal 
move is to take the 20 coin on top of the third stack. 


3.3 Left Heavy Games 


A left heavy game is a game of the form { { z| y} | z } where x, y, 2 are 
numbers and z > y > z. A left heavy game is the simplest game for which 
the notion of a sente move for Left applies, e.g., Figure 3.3. Hence, they 
add a good deal of complexity to a sum of games. 

A f(n)-left heavy game is a sum composed n games where f(n) of them 
are left heavy games and the rest are switches. The goal of this section is 
to analyze such games for various functions f(n). 


3.3.1 C-Left Heavy Games 


A c-left heavy game, for any constant c, is handled efficiently by computing 
a formula for the score of the game. Without loss of generality, assume 
that the switches in a c-left heavy game have been unbiased. Then a c-left 
heavy game has the form: 


{{zilm}layre + {fae l ye} | zeht {tr | ti pte + {tn | tne} 
where t; >... > ty_¢. 
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After Left moves in Hy,4, he threatens to take oo. Right 
is compelled to respond to prevent the threat. Hence, 
Left’s move is sente. 


Figure 3.3: A Left Heavy Tree where Left’s Move is Sente 


Theorem 3.1 Ina c-left heavy game, tt ts at least as good tf not better for 
Left to play in the largest switch then to play in any other switch. 


Proof: Let G be a c-left heavy game, and let G' be the game that 
results when Left moves in the 7** switch. The theorem states that from 
Left’s point of view G! is as least as good as G’. 

It is sufficient to show that in G! + (—G") Left has a winning strategy 
even when Right starts [5, page 78]. Consider G! + (—G") with the j** 
component game of G! paired with j** component game of —G’", #.e., 


{{ziljy}leas + -{{ai]lym}la}s + 
{{relve}lee} + —Cfeelu}lecd + 


ty + —{t, | —é: }) - 
{tz | —te } + —{t2|—t2} te 
{ ti-1 | —ti-1 } : —{t-1 | -—ti-1} om 
{ t; | —t; } + -4t; + 
{ tina | —tizs } + —{ tii | —tiar} za 
{tn | —th } i —{ tne | —th—e } 


Except for the (e+ 1)'" and (e+1+ 1)'* component games, each component 
game is paired with its negative. Whenever Right plays in such a pair, 
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Left can respond with the negative of Right’s move in the other game, and 
cancel any gain Right makes by going first in the pair. 

Hence, the problem is reduced to showing that Left has a winning strat- 
egy when Right moves first in the following game: 


ti + (-{ti | -ti})+ {ti |-ti}-—t 


Since the negative of an unbiased switch is the switch itself, the game above 
is equivalent to: 
tit {t:|-ti}+{t:|-t}-t 


Right’s optimal strategy is to play in the largest switch, { t, | —t, }, and 
Left’s optimal response is to play in the remaining switch, { t; | —t; }. The 
final score is t; — t; +t; —t; = 0 with Right to move. Thus Left wins. Jf 


Using the above theorem, the optimal strategy for a c-left heavy game, 
for any value of c, can be computed. However, for simplicity, only the case 
when c = 1 is considered here. 

Given Theorem 3.1, optimal play in a 1-left heavy game proceeds in 
three distinct stages. First, players alternately play in the largest switch. 
At some point, a player will play in the left heavy tree. If Left plays there, 
the tree is reduced to a switch. If Right plays there, the tree is reduced to 
a number. In either case, the players then return to playing alternately in 
the largest switch. 

The key question is on what move should Left or Right play in the left 
heavy tree. To answer this question it is sufficient to compute formulas for 
the score of the game when Left and Right move in the left heavy tree at 
move i. 

It is convenient to have a shorthand notation for the amount the switches 
{t; | —t;} through {t; | -t; } add to the score when Left starts play in the 
game 

{ti | -ti}+...+ {tae | —tn—e } 


In particular, let: 
[tj] = (ts — tiga t+... £t;) « (-1)**? 


If Right moves first in the left heavy tree at move i, then the score for 
the first stage of the game is [1,i-1]. For the second stage, right moves in 
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the left heavy tree and takes z. For the third stage, play resumes with Left 
and hence the score is the negative of the alternating sum of switches from 
i ton. Adding together the three stages and simplifying, we get: 


RightScore(i) = [1,n] + 2 — 2[t,n] 


When Left plays in the left heavy game, he throws a biased switch, 
{ x | y }, into the set of remaining unbiased switches. It is convenient to 
view that switch as unbiased, t.e., {z | y} = mo + {to | —to } where 
mo = (x+y)/2 and to = (x— y)/2. It is also convenient to know where the 
unbiased switch {to | —to } fits into the ordered list of biased switches. Let 
k be the number such that t, > tp > ty41. 

If Left first moves in the left heavy tree at move: > k, then Right will 
immediately respond in the { z | y } switch and play will then proceed as 
usual. So, 

LeftScore(i) = [1,n] + mo — to when i > k. 


If Left first moves in the left heavy tree at move 1 < k, then the score 
for stage 1 is [1,i-1]. No score is generated for the second stage of the game, 
but the unbiased switch {to | —to } is thrown into the game. The score for 
the third stage is: —[t,k] + mo + (—1)**to + [k +1,n]. Adding together 
the score at each stage and simplifying produces: 


LeftScore(i) = [{1,n] — 2[f,k] + mo + (—1)**1to whent < k. 


Assume that Left always goes first. For increasing i, (246...) the value 
of RightScore(i) will decrease. Since Right is trying to minimize the score, 
Right will delay moving in the left heavy tree. However, if on the next 
move it is to Left’s advantage to play in the left heavy tree, then Right will 
play there in defense. Specifically, Right moves in the left heavy tree on 
move 1 iff 

RightScore(i+2) < LeftScore(i+1) 


Similarly, Left’s optimal strategy on move ? is to move in the left heavy 
tree iff: 
LeftScore(i+2) > RightScore(i+1) 


For example, consider the game shown in Figure 3.4. It is easy to 
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Figure 3.4: A 1-Left Heavy Game 


compute that: 


LeftScore(1) = 6 

RightScore(2) = 20 
LeftScore(3) = 16 
RigthScore(4) = 10 
LeftScore(5) = 16 


Left can play in the largest switch on his first move but must move in left 
heavy tree on the third move. 


3.3.2 Log(n)-Left Heavy Games 


A log(n)-left heavy game can be played in polynomial time, though not as 
efficiently as a c-left heavy game. 

Under reasonable play, the log(n)-left heavy game can take on only a 
polynomial number of different states. In particular, each of the log(n) left 
heavy games can be in one of two states, t.e., either {{ z | y}| z} or 
{x | y}. Furthermore, by Theorem 3.1, reasonable play in the switches 
will consist of first playing out the switches in order. So there are only n 
different ways in which the switches will appear. Hence, there are 


giog(n) nN 
possible configurations. 
Since there are only polynomial number of different states, standard 


dynamic programming techniques can be used to solve the problem in poly- 
nomial time. 
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3.3.3 n-Left Heavy Games 


There is no known polynomial time algorithm for determining the optimal 
strategy for n-left heavy games. This section will present some properties 
of left heavy games that suggest that finding a polynomial time strategy is 
hard. 

In order to study the properties of left heavy games it is convenient to 
consider a special class of left heavy games that have the form {{z | 0} | z}. 
This can be done without loss of generality since any left heavy game can 
be put into this form via the following transformation: 


{{zly}lz}=—-y+{{z2-y|0}|2z-y} 


The value —y will not affect the strategy of either player and can be ignored. 
The following notation is used: 


AP Biff an optimal move for Left in A+ B is in A 


It is easy to compute that if A={{a|0}|c}andB={{d|0}|f} 
then 
AP B +> min(f + 4,0) > min(c + d,0) 


In the best of all possible worlds, there would exist an evaluation func- 
tion F such that A & B iff F(A) > F(B). However, for this to be true 
P> would have to be transitive between left heavy games. That is not the 
case. For example, 


{{1]O}|-1} FB {{2]0}|-1} & {{2]0}|-2} 


but it is not the case that 


{{1|0}|-1} B& {{2]0}|-2} 


Even given that P> is not transitive, it is natural to conjecture that 
if AR B, and A [> C then in the sum A+B+C the optimal move is 
A. However, this is not true. For example, consider the games shown in 
Figure 3.5. In the sum of any pair containing Hj5, Left’s optimal move is 
in Hys. However, in the sum Aj5 + Hig + HAi7, Left’s optimal move is in 
i. 16- 
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Figure 3.5: Defining As, Fie; and Ay7 


From Right’s point of view, the world is a bit simpler. It is easy to show 
that if A={{a|0O}/c}and B={{d|0}| f} then 


AR Be>c>f 


The P& operator is transitive for Right. However, A bm BandARC 
does not imply that A is the optimal move for Right in the game A+B+C. 
For example: 


{{4|0}|—61} FR {{73|0} | -66}, 


and 
{{4|0}|—61} FR {{18|0} | -95}, 


but in the sum: 


{{4|0}|-61}+{{73 | 0} | -66}+ { {18 | 0} | —95 }. 


Right’s optimal move is in the second game. 

All of the above examples show that understanding the context in which 
two or more a left heavy games appear is paramount to understanding 
the relative importance of the games. On further example is shown in 
Figure 3.6. The left heavy games A,B,C, and D, are placed in four different 
contexts. In the first sum, the optimal move is in game A. In the second 
sum, the optimal move is in game B. In the third sum, the optimal move 
is in game C. In the fourth sum, the optimal move is in game D. 
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Game 2: A +[B]+ C + D + AY A’ 
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The optimal move in each game appears in the box 
Figure 3.6: The Importance of Context 
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Chapter 4 
Complexity of SUM 


Lockwood Morris/17] shows that determining optimal play in a disjunctive 
sum of games is PSPACE-complete. His proof holds when the component 
games in the sum are of depth 4 or more, but it leaves open the possibility 
that instances of SUM composed of shorter games could (even assuming 
P # PSPACE) be handled in polynomial time. 

This chapter narrows the gap between those problems known to be in P, 
and those problems known to be PSPACE-complete. Let SUM,. , refer 
to the problem of determining optimal play in a sum of games, where each 
component game has depth less than or equal to n. The natural question 
arises as to how small n can be and still have SU’M,;.,, be PSPACE- 
complete. This chapter will prove that SU My. 2 is PS PAC E-complete. 

The result is important for two reasons. Previously, the only known way 
of determining the optimal play in an instance of SUM... or SU My-3 was 
via an exponential time search. However, there was no convincing evidence 
that the problem was hard enough to warrant such an algorithm Now, with 
a free conscience, one can accept an algorithm with an exponential running 
time. 

The result also affects the types of heuristic solutions that are possible. 
The next chapter presents a relatively complex heuristic that determines a 
good move in a sum of games by using taxation to approximate each compo- 
nent game. One could imagine a simpler algorithm that approximated each 
component game with a depth two or depth three game. Since a depth two 
or a depth three game maintains the notion of sente, the approximation 
could be quite accurate. However, since SUMga<2 is PS PAC E-complete 
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this approach will fail. The approximated version of the game will be 
(roughly speaking) as hard to solve as the original version. 

What follows is the proof that SUMg<2 is PSPACE-complete. The 
proof proceeds in two parts. The first part proves that SUM,.<2 is as hard 
as SAT, 1.e., the problem of determining if a boolean formula is satisfiable. 
The second part, expands upon the first part to show that SUMa<2 is 
equivalent to QBF, 1.e., the problem of determining if a quantified boolean 
formula is satisfiable. 


4.1 SUMg<2 is NP-hard 


Morris transforms an arbitrary instance of PARTITION consisting of num- 
bers 2,,...,2, and the value S = 1/2)> 2; into an instance of SUM. He 
shows that under optimal play, Left chooses a subset of the z's, whose sum 
is L, to remain in play. Right then has the option of playing such that the 
final score is either S ~ L or L — S. Left wins iff either of Rights options 
result in a final score that is greater than or equal to zero. Hence, Left wins 
iff S = L and he has partitioned the x’s. 

This section will prove that SUMy<2 is NP-hard. The proof given here 
is very similar to Morris’s proof. However, the following proof reduces the 
size of the component games by dealing with alternating sums of numbers. 
This allows the more complex games used by Morris to be replaced by 
switches. 

The proof proceeds in three steps. First it is shown that ALT is NP- 
complete. Namely, given the set of integers and a value, the problem of 
deciding if there is a subset of integers such that their alternating sum 
(taken in descending order) equals the given value is N P-complete. 

Next, SUBSET SWITCH is shown to be NP-complete. Namely, given 
a value, B, and a set of switches, the problem of deciding if there exists a 
subset of switches, X, such that V,(X) = B } is NP-complete. 

Finally, SUMa<2 is shown to be NP-hard. An arbitrary instance of 
SUBSET SWITCH, consisting of a set of switches and a value B, is trans- 


1V, is defined in Section 2.1. When applied to a set, it is the final score of the sum 


composed of all elements in the set assuming that Left starts and both players play 


optimally. 
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formed into SUMga<2. A game is constructed where optimal play pro- 
ceeds as follows: Left plays and leaves a subset of switches, X, such that 
V_(X) = L. Right then has the choice of playing such that the final score 
is B— Land L— B. Left wins if both of Right’s alternatives results in a 
final score that is greater than or equal to zero. So, Left wins iff he has 
been able to solve the SUBSET SWITCH problem. 


Lemma 4.1 ALT is NP-complete. Given the set Y of integers and a value 
B, the problem of deciding tf there exists a subset Y' = {yj}, y},---,y,$ CY 
where yg > yi, >... > y and 


B=y-yytec ty, 
ts NP-complete. 


Proof: ALT is in NP. A non-deterministic algorithm simply guesses a 
subset Y’ and checks in polynomial time that its alternating sum is equal 
to B. 

Consider the PARTITION problem. An instance consists of a set of 
integers X = {2zo,...,2n} where t > ... > 2. Karp/[10] showed that 
problem of determining if there is a subset X’ C X such that 


is N P-Complete. 
It is sufficient to reduce PARTITION to ALT. In particular, the fol- 
lowing construction creates an an instance of ALT consisting of a value B 


and a set of integers Y such that it can be solved iff the given instance of 
PARTITION can be solved: 


1 Be Day 


2. For each integer 2; € X, add to the set Y the integers y;, and y;2. 
Assume k is the maximum number of bits required to represent B 
and Zo. Let y,;, and y,2 be s bits long where s = 2n + log(n) + k bits. 
Then: 

va = 2-* +2, 
Yin = 2°-%, 
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You: 
Yor: 
Yi: 
Yi2: 0 | 0 | 


Yai: | 0 | 0 | 0 | 
Yn2: 


Figure 4.1: Constructing an Instance of ALT 


The lower order k bits of y;; contain the value z;, and the higher order 
bits are 0 except for the (s — 21)'" bit. All bits in Yj2 are zero except 
for the (s — 2:)'*. Figure 4.1 shows the construction of the yj1, yi2 
pairs. Note, log(n) bits are left between the high order positional bits 
and the lower order bits containing z; to guard against overflow. 


It is important to note that the y’s decrease in size rapidly, and that 
they are all significantly larger than the value of B. If y,;; is added to the 
alternating sum, then to reach the value of B it is necessary to subtract 
the value y;2. Any solution of the instance of ALT constructed above either 
contains both y;,; and y;2 or contains neither value. Furthermore, since 
Li = Yi — Yio the value of the alternating sum of the y,1, yj2 pairs is the 
sum of the 2,’s coded in the lower k bits of the y;; values. 

The given instances of ALT and PARTITION are equivalent problems. 
The solution for one implies the solution for the other. A solution for the 
instance of ALT constructed above will contain a number of y;1, yj2 pairs. 
The corresponding solution to PARTITION is the set of z; values coded in 
the k lower order bits of the y,,’s. Similarly, given a solution, X', for an 
instance of PARTITION, the corresponding solution to ALT is the set of 
Ui1) Yi2 Values constructed from the z,; € X’. 

Since the construction is accomplished in polynomial time, and since 
ALT can be solved if and only if PARTITION can be solved, ALT is NP- 
complete. [J 
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Corollary 4.1 SUBSET SWITCH 1s NP-complete. In particular, given 
a value S and a set of unbtased switches the problem of deciding tf there 
exists a subset of switches whose sum equals B ts NP-complete. 


Proof: The value for Left in a sum of unbiased switches is the alternating 
sum of the component values of the switches. For example: 


Vi({9|-9}+{4|-4}4+{3|-3} 4 {2|-2})=9-443-2 


The problem of choosing a subset of switches such that the value for Left 
when Left plays first in their sum, t.e., the alternating sum of their com- 
ponents, equals a given value is equivalent to the ALT problem. 


Theorem 4.1 SUMa<2, t.e., determining the optimal play in a sum of 
games where each component game has depth less than or equal to 2, 1s 
NP-hard. 


Proof: Transform an arbitrary instance of SUBSET SWITCH, consisting 
of a value B and a set of switches { xz; | —z: },...,{ 2n | —2n } where 
ZL, >... > Ip, into an instance of SUM <2. Namely, construct the game: 


J+Girt Giz+++++ Gait Gaea+H+I+7 


where 


{{oo| DiL1 Ai — C}| — co} 
{{x;| — 2:}| — Ai} 

{0| — A;} 

{C|-c} 

{D| — 5, {S| — E}} 
{0|{0|0}} 

(4 + Aj+1) 
4*(C+D+E+S+ 2,2) 
D + et Zz; 

4x EF 

2 * Din Ti 


In the above sum, the game f will never affect the score. Furthermore 
since neither player can gain an advantage by playing in it, it will not be 


RaAQS 


> BR? 
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played until all other games have ended. Thus, it will not affect either 
player’s strategy and will be ignored in all discussions of strategy. 

However, } does affect the above game. Left wins a sum of games if 
the final score is greater than zero, or if the final score equals zero and it 
is Right’s turn to play when the game ends. { guarantees that when the 
game ends, it is Right’s turn to move. Thus, a final score of zero will be a 
victory for Left. 

To show that Left has a winning strategy iff he can choose a subset 
of switches whose value is S, it will be convenient to first consider how 
the players are expected to play. This will be called the normal strategy. 
Afterwards it will be shown that neither player can hope to benefit from 
deviating from the normal strategy. 

Normal play proceeds in three stages: beginning, middle and final. The 
beginning stage consists of Left playing in J and Right responding in the 
same game, picking up )> A; — C. 

The middle stage consists of Left and Right both playing in the G games. 
Left begins by playing in either G,,; or Gj... Right responds in the other 
game. Left then chooses to play in either G2; or G22 and Right responds in 
the other one. Play proceeds down the pairs of G,’s until all the G games 
are played. This ends the middle stage. 

Notice that Left’s choice between G;, and Gj2 is the choice between 
including or excluding the switch { z; | —2; } in the final game. In normal 
play Left has total control over which switches are included and which are 
excluded. 

When the final stage begins, the game has been reduced to the sum of 
I, H, and the subset of switches that Left choose to leave in the game. For 
convenience, Let L be Vz of the sum of those switches. 

Play begins in the final stage with Left playing in H. 

Right then has a major decision to make. He must choose between 
playing in the first option and playing in second option of J. If he plays 
in the first option (—S), Left will start the play in the remaining switches. 
The switches will be played out and the final score will be —S + L. If he 
plays in the second option ({ S | —E }), Left will respond in { S | —E } 
taking S and Right will start the play in the switches. The final score will 
be S—L. 

Left will win iff both of Right’s options (taking S — L or L — S) result 
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in a non-negative final score. This is only true if Left solved the SUBSET 
SWITCH problem. 

To show that the normal play described above is optimal for both play- 
ers, it will be shown that in each stage of play neither player can gain an 
advantage by deviating from the strategy described above. 

The play in the beginning stage of the game is obviously optimal. If 
Left does not move in J, Right will gain —oo and win. Similarly, after Left 
plays in J, he is threatening to gain oo. Right must respond in the same 
game to prevent this threat. 

To analyze the play in the middle stage of the game, note that after the 
beginning stage, Left has a huge advantage (> A; — C). With normal play 
Right recoups his loss by gaining one —A; for each of the (Gj, Giz) pair. 
For Right, the major drive in the middle stage is his need to regain at least 
-—> A. 

If Left is playing in the normal way, can Right gain an advantage by 
not responding directly to Left’s moves in the pair G,;,Gj2? The answer is 
no. If Right plays elsewhere, Left will play in the other G; game preventing 
Right from getting —A;. This is bad for Right since: 


i-1 n 
A; >(2* )> As) + C+D+E+S+)502; 
k=1 t=1 


Even if Right could play first in all the other games, he would gain less than 
A;. It is not only better for Right to play in G; than to play somewhere 
else, it is better for Right to play in G; than to play everywhere else. 

Note that the above argument is symmetrical. If for some reason, Right 
was the first player to play in pair Gy,;, Gy, then Left would be forced to 
respond by playing in the other G, game. If Left did not do so, he would 
be unable to compensate for Right having gained 2 * —A,. The fact that 
a move in the pair Gj;, Giz. forces the opponent to respond in that pair of 
games is the key fact needed to prove that Left can not gain an advantage 
by deviating from normal play. 

In particular, suppose that Left and Right have played in the normal way 
in the games: (G11, Gy2,..., Gi-1,1, Gi-1,2) and suppose that Left decided 
to play somewhere besides in the pair G;,,G,2. There are three cases that 
must be analyzed. 
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First suppose that Left decided to play in the Gy,, Gz2 pair where k >t. 
It is easy to see that playing in the Gy, Gy. pair early does not gain Left 
any advantage but only increases Right’s options. Right can respond in 
the pair Gj1,Gj2. By the previous argument, Left is forced to respond in 
the same pair. Right can then play in the pair Gj41,1, Gi+1,2 forcing Left 
to respond. Right can continue to force the play until the Gy1, Gy2 pair 
is reached. Hence, Right can force a line of play that is equivalent to 
the normal play except that Right had control over which of the switches 
{ z; | —z; } where t > j > k were included in the remaining game. Left 
gains nothing by giving up his control. 

The second case is that Left deviated from normal play in order to play 
in H early. Using the same reasoning as above, Right can force a line of 
play that is equivalent to the normal play except that Right has control for 
the pairs Gj, Gjz through Gri, Graz. Left gains nothing by giving up control 
in this way. 

The third possibility, is that Left deviates from normal play in order to 
play in J, f, or one of the { z; | —z; } switches. This is truly disastrous 
for Left. Not only does Right gain control for the pairs Gj1,G,2 through 
Gri, Gn2, Right also gains the option of playing in H and gaining —C. Even 
if Left could then play first in every other game, he could not recoup his 
loss since C > D+ 53. 

When the final stage of the game commences, its Left’s turn to play 
and the game has been reduced to the sum of H, I, ft, and a subset of the 
switches. 

With normal play Left would play in H and gain C. The only reason for 
Left not playing in H and allowing Right to play there, is that he hopes to 
gain more than C. However, C is greater than the sum of all of Left’s options 
in all the remaining games, t.e., C > D+ >>2,. Not only is playing in H 
better than playing somewhere else, playing in H is better than playing 
everywhere else. 

Normal play then dictates that Right plays in J. If Right plays in any 
other game, Left will play in J and pick up D > 8 * >2;. Even if Right 
could play first in every other game, he would not gain more than D. Hence, 
playing in J is optimal. 

After J has been played, the game is a sum of switches. The optimal 
strategy is simply the optimal strategy for playing in a sum of switches. 
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This is equivalent to the normal play described above. 
Thus, normal play does describe the optimal play for both players. Left 
can win iff he can solve the SUBSET SWITCH problem. Jf 


4.2 SUMa<2 is PSPACE-complete 


This section will prove that SUMa<2 is PSPACE-complete. Stockmeyer 
and Meyer[27] proved that QBF, the problem of determining if a quantified 
boolean formula is true, is PS PACE-complete. This section will reduce 
QBF to SU Max. 

Again, the proof is very similar to Morris’s proof that SUM is PS PAC E- 
complete[17], but alternating sums of numbers are used in the reductions. 
This allows the complex games used by Morris to be replaced by switches. 
Hence, the depth of the component games is reduced. 

The proof consists of reducing QBF to 2P-EXACT-COVER, which is 
reduced to 2P-GEN-PARTITION, which is reduced to 2P-ALT, which is 
reduced to 2P-SUBSET-SWITCH which is reduced to SUMg<2. This is a 
rather long series of reductions. However, each step is analogous to a step in 
the N P-hardness proof given in section 4.1. The major difference between 
the two proofs is that the N P-harness proof reduces SAT to SUMg<2 via 
a series of one person games, where as this proof reduces QBF to SUMg<2 
via a series of two person games. The structural similarity of the two proofs 
is shown in Figure 4.2. 

The games used in the PSPACE-completeness proof (2P-EXACT- 
COVER, 2P-ALT, 2P-GEN-PARTITION, 2P-SUBSET-SWITCH) are the 
two person versions of the games used in the N P-hardness proof. They all 
have the same basic form. An instance of the game consists of the following 
set of objects: 

Xi, X1,°° >, Xn Xns Vi, aie Vm 


and possibly the value B. Play begins with Left selecting either Xi, or X1 
Right then selects either X2, or X2. Left and Right continue to alternate 
turns until all the X objects have been selected. Left then selects some 
number of the Y objects. Left wins iff all the selected objects satisfy some 
condition. 

The only difference between the games is what the objects are, and what 
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NP-Hard Proof 


SAT 
Karp [10] 
EXACT COVER 
Karp [10] 
PARTITION 
Lemma 4.1 
ALT 
Corollary 4.1 
SUBSET SWITCH 
Theorem 4.1 
SU Mace 


PS PACE-Complete Proof 


QBF 
Lemma 4.2 
2P-EXACT-COVER 
Lemma 4.3 
2P-GEN-PARTITION 
Lemma 4.4 
2P-ALT 
Corollary 4.2 
2P-SUBSET-SWITCH 
Theorem 4.2 
SU Ma<2 


Figure 4.2: Comparing the Structure of the Proofs 


the winning condition is. In 2P-EXACT-COVER the objects are subsets of 
a set, and Left wins iff the selected subsets form an exact cover of the set. 
In 2P-GEN-PARTITION the objects are numbers, and Left wins iff the 
sum of selected numbers equals B ?. In 2P-ALT the objects are numbers, 
and Left wins iff the alternating sum of the selected numbers equals B. 
Finally, in 2P-SUBSET-SWITCH the objects are switches, and Left wins 
iff V, of the sum of the selected switches equals B. 

What follows are the reductions used to prove that SU Ma<2 is PS PACE- 
complete. The first two reductions (QBF to 2P-EXACT-COVER and 2P- 
EXACT-COVER to 2P-GEN-PARTITION) are given in [17]. They are 
repeated here for completeness. 


Lemma 4.2 2P-EXACT-COVER its PSPACE-complete. 


Proof: 2P-EXACT-COVER is obviously in PSPACE. Since the size of 
the game decreases, a simple exhaustive search with a stack can solve 2P- 
EXACT-COVER using a polynomial amount of space. 

QBF can be viewed as a game between two players, called Exist and 
Forall. Given a formula » = 42,Vr2...¢, Exist begins play by choosing if 
2 is to be set to true of false. Forall then chooses if zz is to be set to true 
or false. Play continues with Exist and Forall alternating turns until all 
the variables have been given a truth assignment. Exists wins iff the truth 
assignment of the variables make ¢ true. 

The basic idea is to convert an arbitrary formula y = 42,Vzr2...¢ into 
an instance of 2P-EXACT-COVER such that Left’s (Right’s) decision over 
whether to choose X; or X; is equivalent to Exist’s (Forall’s) decision of 
whether to set xz; to be true or false. It will be shown that Left will be 
able to choose the Y; subsets to form an exact cover iff ¢ is true under the 
specified truth assignments. 

Without loss of generality, assume that the instance of QBF is in quan- 
tified 3SAT form, t.e., 


p= 421Vrq ene AZapVLaph 


2Note, in 2P-GEN-PARTITION, read as “two person generalized partition”, B can be 
any value, and not necessarily equal to half of the sum of the given numbers. 
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dayVrqdz3Vx4 (21 V2_V x3)(Z1 VZV r4)(Zs V2%2V Z4) = 


X, Xy X2 Xe X3 Xs xX xX 


Figure 4.3: Transforming a Formula into an Instance of 2P-EXACT-COVER 


where ¢ = C, A...AC,, and each clause C; is the disjunct of exactly 
three literals or negated literals. Morris transforms y into an instance of 
2P-EXACT-COVER via the following construction: 
1. The grand set X contains the following 4m points: 
(a) One point, C;, for each clause of ¢. 


(b) One point, 2?, for each i,j such that 2; is a literal in 
clause C; 


(c) One point, Z/, for each i,j such that 2; is in clause Cj. 
2. The subset X; contains all points in X of the form 2. 
3. The subset X; contains all points in X of the form Z. 
4. The 6m Y; subsets are defined as follows: 


(a) One subset for each point 2? in X. 

(b) One subset for each point # in X. 

(c) One subset for each pair of points (z?,C,) and ( Z, Cy) 
in X 


For example, Figure 4.3 shows the transformation of a formula into an 
instance of 2P-EXACT-COVER. 
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There is a direct correspondence between play in an instance of QBF 
and play in the corresponding instance of a 2P-EXACT-COVER game. If 
a player sets a variable to false in QBF, then that variable can not be used 
to turn on any clause. This is analogous to a player in the 2P-EXACT- 
COVER game selecting the subset X;, since it prevents Left from using 
any (},C;) subset to cover C;. Similarly, if a player in QBF sets a variable 
to true, then that variable turns on any clause that contains it. This is 
analogous to a player in the EXACT-SET game selecting subset X, and 
enabling any (zj,C;) to cover C;. 

Since the reduction can be done in polynomial time, and since the QBF 
game can be solved iff the 2P-EXACT-COVER can be solved 2P-EXACT- 
COVER is PSPACE-complete. J 


Lemma 4.3 2P-GEN-PARTITION is PS PACE-complete. 


Proof: 2P-GEN-PARTITION is obviously in PSPACE. Since the size 
of the game decreases, a simple exhaustive search with a stack can solve 
2P-GEN-PARTITION using a polynomial amount of space. 

Hence, it is sufficient to reduce 2P-EXACT-COVER to 2P-GEN-PAR- 
TITION. Let: 


T1,71,°°' In, Fy, Y1,°°* Ym 


be the subsets of X in an instance of 2P-EXACT-COVER. To convert it 
into an instance of 2P-GEN-PARTITION is easy. If the set X contains k 
elements, then each subset is represented as a k digit, base four number. 
Each digit in the number represents an element in the set. The z** digit of 
the number is set to one iff the subset contains the ith element of the set. 
Otherwise it is set to zero. B is set to 2't! — 1, 4.e., 111...111. 

Consider a typical solution to the instance of 2P-GEN-PARTITION 
constructed above. Since the sum of the selected numbers can never pro- 
duce a carry, the only way to achieve B is to have, for all 1, exactly one 
of the selected numbers have the :** digit set to one. Hence, the solution 
represents an exact cover of X. 

Since the reduction can be done in polynomial time, 2P-GEN-PARTI- 
TION is PSPACE-complete. ff 
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Lemma 4.4 2P-ALT 1s PSPACE-complete. 


Proof: 2P-ALT is obviously in PSPACE. Since the size of the game 
decreases, a simple exhaustive search with a stack can solve 2P-ALT using 
a polynomial amount of space. 

Let the value 6 and the numbers: 


LO) Z05*** > IFny Fn Yiys°*° Ym 


be an instance of 2P-GEN-PARTITION. Without loss of generality, assume 
that the largest value requires k bits to represent, and that n is odd. Then, 
the following instance of 2P-ALT is solvable iff the given instance of 2P- 
GEN-PARTITION is solvable: 


Xo, X05 inna Musk as Yu, Via, oe Yoni Ym2 


where 
8s = 2m+log(m)+k 
t = sin 
2'-*42, if ieven 
Xi > ) oti_g. ifiodd 
ti. cf: 
a 2°*4 2%; ifieven 


git z if iodd 

Y; = gs 28 4 Vi 

Y; — Qe-28 

B — Mt4F34...4 etl 1b 


The high order bits of the X; and X; guarantee that their values are 
decreasing in size as 1 increases. This ensures in the alternating sum, the 
values Left selects will be added into the sum and that the values Right 
selects will be subtracted from the sum. 

Ignoring the high order bits of X; and X;, the effect of selecting z, in 
2P-GEN-PARTITION and the effect of selecting X; in 2P-ALT is the same. 
If + is even, then in either case the value of x; is added into the sum. If : is 
odd, then in the case of 2P-GEN-PARTITION, the value z; is added into 
the sum. In the case of 2P-ALT, the value —z; is subtracted from the sum 
which is equivalent to adding zx; into the sum. 
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Unfortunately, the high order bits do have a secondary effect of adding 
into the sum 2'-1+2'-$4...+2°+1, However, this extra amount is accounted 
for in the value of B. 

The construction of the Yj; and Yj2 is exactly the same as in the re- 
duction of PARTITION to ALT. Again the construction guarantees that if 
Left selects Y;, he must also select Y;.. The net result of selecting Y;; and 
Y;2 is that y; is added to the sum. 

Hence, there is a simple and direct correspondence between a solution 
to the given instance of 2P-GEN-PARTITION and 2P-ALT. z; is in the 
solution of 2P-GEN-PARTITION iff X; is in the solution of the correspond- 
ing instance of 2P-ALT. Z, is in the solution of 2P-GEN-PARTITION iff 
X; is in the solution of the corresponding instance of 2P-ALT. y; is in the 
solution of 2P-GEN-PARTITION iff Yj, and Yj; is in the solution of the 
corresponding instance of 2P-ALT. 

Since the reduction can be done in polynomial time, 2P-ALT is PSPACE- 
complete. Jj 


Corollary 4.2 2P-SUBSET-SWITCH ts PSPACE-complete. 


Proof: 2P-ALT and 2P-SUBSET-SWITCH are obviously equivalent since 
the value for Left when starting play in the sum of unbiased switches is the 
alternating sum of the component values of the switches. 


Theorem 4.2 SUMu<2 1s PSPACE-complete 


Proof: SUMa<2 is obviously in PSPACE. Since the size of the game 
decreases, a simple exhaustive search with a stack can solve SUMua<2 using 
a polynomial amount of space. 

Hence, if is sufficient to reduce 2P-SUBSET-SWITCH to SUMyj<2. The 
difference between a 2P-SUBSET-SWITCH game and a SUBSET SWITCH 
game is that a 2P-SUBSET-SWITCH game contains an extra stage where 
Left and Right alternately select switches. Hence, it is sufficient to add com- 
ponents games to the sum constructed in Theorem 4.1 such that optimal 
play in the sum contains an extra stage where Left and Right alternately 
decide which switches are to remain in the game. In particular, transform 


43 


an arbitrary instance of 2P-SUBSET-SWITCH consisting of the value S 
and the switches: 


{a1 | —21 },{ 71 | —¥1},---,{ ren | —Zan }, { Zn | —Fan }, 
{uy | —y1 },°++5{ Ym | —ym } 


into an instance of SUMy<2, construct the game: 


J+ Kirt Kia t-+++ Kart Kn t+ Gir + Giz t+++ + Grit Gm2t+ H+I+ Tt 


where 
Jo > {{oo| VR, Ai — C}| —co } 
Kn > { {xai-1 | —Za-1 }, { Zav-1 | —Zai-1 } | —Fas-1 } 
Kz — Se eA cee ea ead 
Ga > {{u]—-u}| —A;} 
Giz — {0|-—A; } 
H -— {C|-C} 
I -+ {D|-S,{S|-E}} 
tT 7 {0/{0]0}} 
and 
F; > 4% Fy 
Fy > 4% A, 
Aj > 4% Ajy1 
A, > 4#(C+D+£4+5S5+ 2,2) 
C > D+ LE 
D > 4*E 


E > 2 Dot yj. 

The only significant difference between this game and the game con- 
structed for Theorem 4.1 is that this game contains a number of K com- 
ponent games. Optimal play in both games is very similar. 

As in the game constructed for Theorem 4.1, optimal play begins with 
Left moving in J and Right responding taking 07, A; — C. If Left does 
not move in J, Right can gain —oo and win. Similarly, after Left plays in 
J, he is threatening to gain oo. Right must respond in the same game to 
prevent this threat. 

Next, Left must play in K,,. If he does not, Right will pick up —F,. 
Since F; is greater than the sum of all other values in the sum, Left is better 
off playing in Ky, then playing first in all other games. 
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Similarly, Right then must play in K,,. If he does not, Left will pick up 
F,. Since Fy is greater than the sum of all other values in the sum, Right 
could play first in all other games and still not recoup her loss. 

By Repeating the above argument, it is easy to see that the K games will 
be played out in order. Left and Right will alternately make the decision 
of whether to include { z,; | —2; } or { Z; | —Z; } into the rest of the game. 

After the K; games have been played out, the game is exactly like 
the game constructed in Theorem 4.1 except that the game contains an 
additional 2n switches. However, the additional switches do not affect 
optimal play as the games G;1, Gj2, H, and J are played out. 

So, as in the game constructed in Theorem 4.1, play then proceeds in the 
G games. Left begins by playing in either G,, or Gy2, and hence deciding if 
the switch {y, | —y,} should be selected or not. Right responds in the other 
game. Play proceeds down the pairs of G,’s, with Left choosing whether or 
not to select each { y; | —y; }. 

After the G game is played out, Left must play in H and Right gains 
control over the play. 

Let L be V, of the remaining switches. Right must decide which option 
of I to play in. If he plays in first option, Left will start the play in the 
remaining switches. The switches will be played out and the final score will 
be —S+L. If he plays in the second option, Left will respond in {S | —E} 
taking S and Right will start the play in the switches. The final score will 
be S—L. 

Thus, Left will win iff both of Right’s options (taking S — L or L—S) 
result in a non negative final score. This is true iff Left solved the 2P-SUB- 
SET-SWITCH problem. 

Since the reduction can be done in polynomial time, SU Mg<2 is PS PAC E- 
complete. fj 
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Chapter 5 
Coping with SUM 


Since SUM is PSPACE-complete, it is unlikely that an efficient algorithm 
for solving it will be discovered. Two alternate approaches exist for coping 
with an instance of SUM. The first is to relax the criteria of success. Instead 
of requiring optimal solutions, an algorithm is only required to produce 
a solution that is close to optimal. The second alternative is to do an 
exponential time search for the optimal solution. 

Heuristic solutions are obviously beneficial when speed is critical and 
small errors in the solution are tolerable. 

A “good” though not optimal solution to SUM can have one of two 
forms. It can approximate the value of the final score when both players 
play optimally, or it can heuristically choose a move. This thesis focuses 
on algorithms that approximate the final score to within a known error. 

If P 4 PSPACE, then there are constraints on how good any polyno- 
mial time approximation can be. Assume that an efficient heuristic solution 
approximates the value of the final score under optimal play to within some 
specified accuracy. By a simple “change of currency” argument, it is easy 
to show that the accuracy can not be within a constant of the optimal so- 
lution. Similarly, the accuracy can not be within a multiple of the optimal 
solution since determining if the final score of an instance of SUM equals 
zero is PS PAC E-complete. 

However, Hanner[7| proved that it is possible to approximate the final 
score of a sum of games to within an accuracy dependent solely upon the 
“worst” component game. His bounds are independent on the complex- 
ity of the component games, and on the number of component games. In 
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particular, he proved that in the sum G, +...+ Gy, it is possible to ap- 
proximate the final score to within maz{o(G;) | 1 <1 < n} of the optimum 
solution. 

This chapter presents a revised version of Hanner’s proof. Hanner’s 
result was obtained by imagining that each player in a game is forced to 
pay a taz for the privilege of moving. The fact that this yields results 
which apply to normal play seems to be a bit magical. This chapter recasts 
the work in a more intuitive light, and develops the mathematical notation 
required for a clean and concise proof. 

This chapter also improves upon Hanner’s result to show that the final 
score of G; + ...+ G, can be approximated to within the second largest 
Co (G;) . 

The second approach to handling an instance of SUM is to perform a 
min-max search for the optimal solution. Such algorithms require exponen- 
tial time. However, their running time can be greatly improved upon by 
pruning techniques. This chapter considers how the approximate solutions 
developed in the chapter can be used to guide and prune the search for an 
optimal solution. Particular attention will be paid to the use of the approx- 
imate algorithms in conjunction with Berliner’s B*({4] search algorithm. 

One consequence of using B* is that it increases the motivation for 
finding better approximate solutions. The better the approximate solution, 
the more “informed” and hence the better the B”* search will be. 


5.1 Heuristic Solutions 


This section places bounds on the final score of a sum of games. The basic 
approach is to define a heuristic strategy, and then show that if Left uses 
the strategy he can force the final score be at least some minimum value, 
and if Right uses the strategy he can hold down the final score to be at most 
some maximum value. Hence, the true value of the game is guaranteed to 
be between the minimum and maximum value. 

This section outlines four different results. The first, by Milnor(16], 
is obtained via a follow the leader strategy. The second, by Hanner{7], is 
obtained via the mean strategy. The last two results are improvements on 
Hanner’s. 
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5.1.1 Follow the Leader Strategy 


The basic strategy used for approximating the value of a game is the fol- 
lowing follow the leader strategy: 


1. Only play locally optimal moves. 
2. Whenever possible, play in the same game as the opponent. 


The purpose of the strategy is to simplify the play in a sum of games. 
Each component game is played out as if it were the sole game in the sum. 
Thus, it is possible to approximate the final score for G in terms of the its 
component games. 

The follow the leader strategy was first used by Milnor{16]! to bound 
the sum of two games. If G = G; +... + Gy, then the generalized version 
of the result is: 


Vi(G1) + Vr(G2) +-+-+Ver(Gn) < Vz(G) < Vi(Gi) +--+ + Vi (Gn). 


To prove the right hand side of the inequality, assume that Right adopts 
the follow the leader strategy: If Right always plays in the same game as 
Left, then each component game G; will be played out as if it was the only 
game in the sum. Furthermore, if both Left and Right play locally optimal 
moves, then the final score of each component game will be V,(G;). Thus, 
the final score of the sum will be 


Vz(G1) +...4+ Vi (Gy). 


Left can not increase this final score. If Left plays any move that is not 
locally optimal, the final score will be the same or lowered. 

If in one of the games Left plays into an end position, Right will be 
forced to play twice (or first) in some other game. Since its always to 
Right’s advantage to play twice in a game (or to start a game), this only 
serves to lower the final score. Hence by using the strategy Right can 
guarantee that 


Vi(Git-+++ Gn) < Vi(Gi) +--- + Vi (Ga) 
1Milnor’s and Hanner’s model of a game is slightly different than Conway’s. They do not 


make a clean distinguish between a number and a game. However, their results hold in 
Conway’s model. 
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Similarly, to prove the left hand side of the inequality, assume that Left 
follows Right and responds with locally optimal moves. Also assume that 
Left’s first move is the locally optimal move in G, to G¥. If Left is always 
able to play in the same game as Right, and if Right always makes locally 
optimal moves than the final score will be 


Vr (Gt) + Vr(Go) +++: + Vr(Gn). 


The situation can only improve for Left if Right does not make locally 
optimal moves, or if Right plays into an end position forcing Left to play 
twice (or first) in the another game. Since Vp(G/) = Vz(G;), this yields: 


Vi(G1) + Va(Go) + +++ + Ver(Gn) < Vi(Git---+G,) I 


5.1.2 Mean Strategy 


The follow the leader strategy, as used by Milnor, is not very powerful. The 
game is played out in a way that strongly favors the “leader”. Hanner(7] 
improves on Milnor’s result by devising a more powerful follow the leader 
strategy. In particular, he defines the mean strategy which is a follow the 
leader strategy where the follower always plays t-opt:mal moves. Using the 
strategy, he proves that if G=G,+...+ G, then 


Mean(G) < Vz(G) < Mean(G) + Maz(o(G;)) 


Hanner’s bounds are surprisingly accurate. They are dependent solely 
on the hottest game. Thus, unlike Milnor’s bounds, the accuracy of the 
bounds do not decrease as the number of games in the sum increases. Any 
number of cooler games can be included in the sum, and the accuracy re- 
mains the same. For example, Figure 5.1 compares the accuracy of Hanner’s 
and Milnor’s bounds. 

Before defining t-optimal moves, it is necessary to generalize the no- 
tation for ‘Ln(G),‘Rn(G), ‘V(G), and ‘Vg(G) to include the concept of a 
player using a specific strategy. In particular: 


‘L?¥(G) > The position that results after n moves when a tax 
of t is imposed on each move, both players play alternately, 
Left starts, Left plays according to strategy x, and Right 
plays according to strategy y. If no strategy is specified, 
then an optimal strategy is assumed. 
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40 30 
Vz 30 20 35 15 
Vr —25 —20 0 —10 
Mean: 5 0 15 5 
o 30 20 20 15 


Hyg + Hig Aig + Hig + Heo | His + Aig + H20 + Har 


10 < V; < 50 10 < Vz < 85 0<V,; < 100 
75 


bounds 
accuracy 
bounds 


Milnor: 
100 


5<V,<35 | 20<V, <50 25 < V, < 65 


Figure 5.1: Comparing the Accuracy of Hanner’s and Milnor’s Results 


Hanner: 
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*R7¥(G) > has the same meaning as ‘L?”(G) except that Right 
starts the play in G. 


‘Vr"(G) => is the final score when a tax of t is imposed on 
each move, both players play alternately, Left starts, Left 
plays according to strategy x, and Right plays according 
to strategy y. If no strategy is specified, then an optimal 
strategy is assumed. 


‘Ve’ (G) => is the same as ‘V;"(G) except that Right starts the 
play in G. 


Furthermore, the following notation is used for referring to particular strate- 
gies: 


e “QO” — optimal strategy 


e “” —,+ arbitrary strategy 


e “t”— t-optimal strategy (to be defined). 


For example, R;°G is the position reached after 5 moves in G when 
Left plays an arbitrary strategy, Right plays optimally, and Right plays 
first. ‘V?2~(R5°G) is the final score when R;°G is played such that Left 
plays only optimal moves, Right plays an arbitrary strategy, Left starts the 
play, and a tax of t is applied to each move. 

In the notation for ‘L7¥(G),*R2¥(G), ‘V;"(G), and ‘Vp’ (G), the strategy 
x and y is defined after the rules of the game are defined, t.e., the strategy is 
dependent upon whether a tax of ¢ is imposed on every move. For example, 
in general, ‘L{-(G) # L‘"(G) since the optimal move in a taxed game is 
different from the optimal move in a untaxed game. 

A t-optimal move in a (untaxed) game is defined to be the move that 
would be optimal if tax of t was imposed on every move in the game. How- 
ever, t-optimal moves only exist for t < o(G). If a tax of t > a(G) is 
imposed on every move in G, then it is in neither player’s advantage to 
move first and the game is a number. Hence, the t-optimal move in the 
untaxed version of the game is undefined. Thus, a t-optimal for Left is 
defined as follows: 


'1-(G) if t < o(G) 


fi (@)= undefined if t > o(G) io} 
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12 2 
20 10 
o(Hz2) = 8 
Lo” He. = 12 RQ (H22) = 2 
Li Hy = 12 Ry (Hz) = 2 
Li Hy, = {20|10} R&-(Hy2) = 2 
L? Hy, = undefined R?-(H22) = undefined 


Figure 5.2: t-optimal moves in Ho 


The “t” in “t-optimal” is a variable that can take on different values. 
For example, Figure 5.2 shows the t-optimal moves in H2 for various values 
of t. Note, a 0-optimal move is equivalent to an optimal move since a tax 
of zero is equivalent to no tax. 

Hanner provides some motivation for using a strategy with t-optimal 
moves by stating that 


When a player shall move in a sum of games he chooses one 
game, say G, and there makes a move. Thereby he loses the 
possibility to make the move in one of the other games. If the 
value of this possibility is put equal to ¢ it is natural to compare 
the situation with the case when the player has to move in G 
and pay the amount ¢ to the other player when moving. 


Unfortunately, Hanner does not expound any further on the nature of t- 
optimal moves. The important question of how t-optimal moves work re- 
mains. 

One way to view Hanner’s strategy, is that it addresses a basic weakness 
found in Milnor’s strategy, t.e., tempo. In Milnor’s strategy, the follower is 
a wimp that passively responds to the leader’s move, even when it is obvious 
that the leader’s move is not sente. For example, consider the game shown 
in Figure 5.3. The first game is hot, and it is natural to assume that the 
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+ 
JR OR AX 
10 0 


100 =98 -98  -100 


Figure 5.3: A Sum Where the First Move Is Not Sente 


90 
+ 
2, A» 
100 9 10 0 


Figure 5.4: A Sum Where the Locally Optimal Response Is Incorrect 


leader will play there. However, the follower gains very little by responding 
to the leader’s move. 

In Milnor’s strategy, the follower’s ignorance of tempo is also illustrated 
by his exclusive use of locally optimal moves. A non-optimal move that 
forces a response from one’s opponent is often better than a locally optimal 
move that can be ignored. For example, consider the game shown in Figure 
5.4. Assume that the leader (Right) played in the first game. The locally 
optimal response it to reduce the game to 10. However, from a global 
perspective, it is better for the follower to reduce the game to { 100 | 9 }, 
force the leader to take 9, and then play first in the other component. 

Hanner’s strategy addresses the issue of tempo via tazation. Taxation 
provides some insight into the value of a move. It was used in section 2.6 
to efficiently compute the temperature of a game. Here, in the form of t- 
optimal moves, it is used to provide the follower with some intuition about 
tempo. In particular, before using Hanner’s mean strategy it is necessary 
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to set the value of t. This sets a minimum threshold of ¢ on the value of a 
move. The effect is two fold. 

First, t-optimal moves only exist if a move is worth at least t. Hence, 
the follower can only respond in the same game as the leader when doing 
so is worth at least t. For example, reconsider Figure 5.3. A t-optimal 
response to the leader’s move in the first game exists only when ¢ < 1. 
That is, the follower can only respond in the first game if the threshold is 
set to be less than or equal to one. 

Second, setting the value of ¢ sets minimum threshold on the amount the 
follower can lose in a local situation in order to keep sente. For example, 
reconsider Figure 5.4. After the leader (Right) plays in the first game, 
the follower(Left) has two options. Under Hanner’s mean strategy, which 
option is chosen depends upon the value of t. If ¢ > 1 then the t-optimal 
response is to move to {100 | 9}. That is, Left takes a local loss of 1 (he 
gets 9 instead of 10) in order to move first in the other component game. 
On the other hand, if t < 1, then the follower is unwilling to take a loss of 
1. The t-optimal response is to take 10. 

What follows is a revised version of Hanner’s proof. The proof is divided 
into two parts. The first part proves that in a single game G;, a player 
playing o(G,)-optimal moves can force the mean of final position of G; to 
be within o(G;) of mean(G,). The second part proves that in a sum of 
game G=G,+...+Gny, a player has a strategy that forces the final score 
to be close to its mean value. 


Lemma 5.1 Assume t > o(G). Consider a sequence of n moves inG such 
that Left always has a t-optimal move, 3.e., 


Vt, 1<t<n-1, o(Li(G)) >t. 


If Left always plays a t-optimal move, then the following equations describe 
how mean(G) compares to the mean of the final postition tn the sequence 
based upon who moves first and last in the sequence. 


Mean(L4,G) > Mean(G)+t 

Mean(Li,G) > Mean(G) ifo(Li,G) <t 
Mean(Ri,(G)) > Mean(G) 

Mean(Ry4,(G)) > Mean(G)—t tfo(Rjj41G) <t 
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In general, it is hard to prove anything about non-optimal moves in a 
gaine. However, a t-optimal move in a untaxed game is the optimal move 
in a taxed game. Hence, questions about t-optimal moves in an untaxed 
game can be answered by considering optimal moves in a taxed game. 

In particular, consider a sequence of moves in a taxed game. Assume 
both players play optimally. If there are an even number of moves, then 
the taxes Left pays to Right and the taxes Right pays to Left will cancel 
out. If there are an odd number of moves, then one player will pay an extra 
tax. By repeatedly applying equations 2.3 and its dual, this can be stated 
algebraically as follows: 


Var Lx 1(G)) = 'V,(G) +t 
‘Vi(‘Le(G)) = ‘Vi(G) 
‘Vr('Rx(G)) = Vr(G) 


This is the heart of the lemma. The rest of the proof simply massages 
the above equations into the desired form. 

The left hand side of the above equations assume that both players play 
optimally. However, if Right does not play optimally this only helps Left. 
So: 


VAL. (G) > *Vr'Lou.s(G) = 'Vi(G)+t 
ViL(G) > ‘Vitba(G) = 'Vi(G) 
VaR (G) > ‘Vr'Ru(G) = Vel) 
Vi Ry (G) > ‘Ve'Reii(G) = 'Va(G)-t 


By equation 5.1 optimal moves in a taxed game are t-optimal moves in 
a non-taxed game. So, 


Valse s(G) = ‘Va'Loe(G) > 'Vi(G) +t 

ViLy(G) = ‘WL (G) > ‘Vi(C) 

"VaRy(G) = ‘Vr'Ry(G) > ‘Va(G) 
eC 


Vi Ri(G) = ‘Vi'Ryei(G) 2 ‘Va(G)-t 
The right hand side of all the above equations can be simplified since 
by Equation 2.6 when t > o(G),‘Vr(G) = ‘Vr(G) = Mean(G). So, 


"Valoii(G) > Vi(G)+t = Mean(G)+t 
ViL(G) > ‘Vi(G) = Mean(G) 
VaR. (G) > ‘Vr(G) = Mean(G) 
"ViRi.i(G) > Vr(G)-t = Mean(G)-t 
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Finally, the left hand side of the first and third equations can be simplified 
since the mean value of a position is always greater than the value for 
Right (equation 2.6 combined with equation 2.4). The second and fourth 
equations can be simplified using equations 2.6 assuming that o of the final 
position in the sequence is lower than t. So, 


Mean(Li41G) > ‘VeLiys(G) = Mean(G) +t 

Mean(L3,G) > ‘ViLh,(G) = Mean(G) if o(L5,G) <t 
Mean(R3,G) > ‘VrRi(G) = Mean(G) 

Mean(Ri,,G) > V,Rir,,(G) = Mean(G)-t ifo(R4,G) <t li 


Theorem 5.1 For the game G = G, + ...+ Gn, let: 
Mean(G) = Mean(G,+---+ Gn) 


o = maz{o(G,)|1<k<n} 
If Left ts the first player to move, then: 


Mean(G) < Vi(Gi++--+ G,) < Mean(G) +o 


Proof: The proof proceeds by induction on the number of moves in a game. 
Let 1(G) be the maximum number of of moves that can be played in G. 

Basis: If 1(Gi+...+G,) = 0 then all G,, 1 <k <n, are end positions, 
t.e., numbers. Thus, o(G;) = 0 and mean(G;) = Vi(G;). The theorem 
holds. 

Induction Step: Assume the theorem holds for all games such that 
I(G,+...+G,) <m. It will be shown that it holds for all games such that 
[Gy +...+G,)=m4+1. 

The proof will center around the mean strategy. The mean strategy is 
the follow the leader strategy described below: 


1. Always play in the same game as your opponent except if 
you must make the first move. 


2. Only play o-optimal moves. 
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It will not always be possible to play according to the above strategy since o- 
optimal moves do not always exist. In particular, the strategy is considered 
valid only until one of the following two conditions occur: 


a: The opponent plays in G; leaving position P; such that 
o(P;)<o 


8: Positions P,,1 <r <n have been reached for which o(P,) < 
GO. 


To prove the left hand side of the inequality, it is sufficient to show 
that if Left uses the mean strategy until one of the two stopping conditions 
occur, then G will be reduced to a game P = Py +... +P, such that if 
its Left’s turn to play in P then V;(P) > Mean(G) where as if its Right’s 
turn to play in P then Vp(P) > Mean(G). 

Consider how the mean of each game P; compares to the mean of the 
corresponding G; game. Each P, will have one of the following four forms 
xp Gi, La, Gi, R2e4,Gi, or RQ, Gj. Lemma 5.1 specifies to for each 2 : 
l<i<n: 


Mean(le41Gi) > Mean(G,)+t 

Mean(Li, Gi) 2 Mean(G;,) if o(L3, Gi) <t 
Mean(Ri,Gi) > Mean(G;) 

Mean(R%41Gi) > Mean(G,;)—t if o(R}4,Gi) <t 


It is important to note that if Right makes the last move in the ith com- 
ponent game reducing it to P;, then it must be the case that o(P,) is 
greater than o (otherwise Left would have responded with a o-optimal 
move). Hence, the conditions applying to the second and fourth equations 
above are satisfied. 

The above four formulas can be condensed into the following equation: 


Mean(P;) > Mean(G,) + ly.0 — lao 


where /;; is the number of moves made by Left in G; and lp, is the number 
of moves made by Right in G;. Taking the sum of the inequalities for all 
1,1<1 <n produces 


Mean(P) > Mean(G) +lro —lzo (5.2) 
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where [; and lz are the number of moves made by Left and Right respec- 
tively. 

Consider the number of moves made when G is reduced to P. If it is 
even, then /, = lp and it is Left’s turn to move in P. By the induction 
hypothesis and equation 5.2 


Vi (P) > Mean(P) > Mean(G) 


If the number of moves is odd, then /, = Ip + 1 and its Right’s turn to 
move in P. Play must have ended due to the @ stopping condition. Hence, 
maz{o(P;,)|1 <i<n}<o. This, combined with the dual of the induction 
hypothesis and equation 5.2 yields: 


Vr(P) > Mean(P) — mar{o(P;)|1 <1 <n} 
> Mean(P)-—o 
> Mean(G)+a-o 
> Mean(G) 


Thus, by playing o-optimal moves Left can reduce G to some game P 
such that the value of P is greater than or equal to the mean value of G. 
The left hand side of the inequality is proven. 

To prove the right hand side of the inequality, it is sufficient to show if 
Right uses the follow-the-leader strategy, then G will be reduced to a game 
P=P,+...+P, and the value of P is less than Mean(G) +o. 

Consider a component games P, of P. By construction Right will never 
play first in a game. Hence, the dual of the first two equations of Lemma 
5.1 specify how the mean of each P; compares to the mean value of the 
corresponding G,;. That is, for each 1,1<i<n: 


Mean(Li~G) < Mean(G;) 
Mean(L3y,1G) < Mean(G;)+o 


Proceeding as before, the above equations are reduced to 
Mean(Pi) < Mean(G;) + liso — lry. 
Summing the above equation for all 1 yields: 


Mean(P) < Mean(G) +lro —lro (5.3) 
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Consider the number of moves made when G is reduced to P. If it 
is even, then /; = lp and it is Left’s move in P. Play must have ended 
due to the @ stopping condition. Hence, maz{o(P;)|1 <1 <n} <o. This, 
combined with the dual of the induction hypothesis and equation 5.3 yields: 


Vi(P) Mean(P) + mar{o(P,) |1<i<n} 
Mean(P)+o 


Mean(G) +o 


IAIAIA 


If the number of moves is odd, then /; = [zp + 1 its Right’s move in P. 
By the dual of the induction hypothesis and equation 5.3: 


Vr(P) < Mean(P) < Mean(G) +o. 


Thus, the right hand side of the inequality is proven. JJ 


5.1.3 Improving Hanner’s Bounds 


Hanner’s strategy is actually better than Hanner claimed. Consider what 
happens to Hanner’s bounds as G’ in Figure 5.5 is played out. Initially, 
Hanner’s bounds are quite accurate, t.e., if Left plays first in G’ then the 
final score will be between —90 and —64. After Left moves the game be- 
comes more volatile and the accuracy of Hanner’s bounds decrease, 1.e., 
—110 < V;(G) < 10. 

However, the bounds computed for G’ also apply to G. Consider the 
upper bound on G’. Hanner proves that no matter how Left plays, Right has 
a strategy that guarantees to hold the final score below —64. In particular, 
if Left moves to G, Hanner proves that Right has a strategy that guarantees 
the final score will be less than or equal to —64, Therefore, Right, starting 
from G, has a strategy that holds the final score down to —64 and it is fair 
to claim that Vr(G) < —64. 

This suggests an interesting way to improve on Hanner’s bounds. As- 
sume that the game originally began with Right to move first in G. Hanner’s 
bounds would not be very informative. Better bounds could be computed 
simply by tmagitning that the game actually began with G' and Left to 
move. 

What follows is a proof that the value of a game, G = Gi +...+ G, can 
be estimated to within the second largest o(G;). The basic idea is that a 
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Gi= + + 
AY A’ or 


100 = -100 50 0 10 -10 
Hanner’s bounds: —90 < V;,(G') < —64 


Left’s move using 


Hanner’s strategy 


G= rat + ~ 
-25 20 
100 =-100 
50 


0 10 -10 


Hanner’s bounds: ~90 < Vr(G) < 10 


Figure 5.5: Hanner’s Bounds as G’ Is Played 
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y Vi(H) Vr(H) 
Figure 5.6: Constructing the Thermograph for H' 


new game G’ is constructed such that the o(G’) equals the second largest 
o(G;) and in a way that guarantees that Hanner’s bounds on G’ also apply 
to G. The ability to construct such a game is proven in the following two 
lemmas: 


Lemma 5.2 For any game H and number z > 0, a new game H' can be 
constructed such that o(H') = x and Vs, Li (H') = H, t.e., the only Left 
option from H' ts to move to H. 


Proof: Let H' be { y | H } for some value of y. Consider an arbitrary 
thermograph for H and the resulting thermograph for H' as shown in Figure 
5.6. It is obvious that one can always set the value of y such that o(H’) is 
as high or as low as one wants. §j 


Lemma 5.3 Let G = G,+...+G, and o = mar{o(G;) | 1 <7 < n}. 
Then, 
Mean(G) < Vr(LT G) < Mean(G) +0. 
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Proof: The basic idea is that Hanner’s bounds on G also apply to the game 
that results when Left plays a o-optimal move. 

Hanner proves that if Left moves to L{ G then he can force the final 
score to be greater than or equal to Mean(G), 1.e., 


Mean(G) < Vr(LTG). 


Similarly, Hanner proves that for any Left move in G, Right can respond 
such that the final score is less than Mean(G) +¢. In particular, if Left 
moves to Li G, Right can respond such that the final score is less than 
Mean(G) +o. Hence, 


Vr(LI-G) < Mean(G)+0.1 


Theorem 5.2 Let G = G,+G2+...+Gny where Vi,1 <1 <n—1,0(G,) > 
o(Gi41). Then, it ts possible to approximate the value of Gj +...+ Gy to 
within o(G2) of the optimal value. 


Proof: Construct a new game H, out of G; such that Vs, L}” H, = G, and 
o(H,) = o(G2). Lemma 5.2 guarantees that this can be done. 
Let G' = H,+G,+...+G,. Lemma 5.3 guarantees 


Mean(G') < Vr(L{-G') < Mean(G’) + o(G’). 
One o-optimal move in G' is to move in Hy. This reduces the G’ to G. So, 
Mean(G') < VrG < Mean(G') + o(G’). 
Furthermore: 
o(G") = maz({o(G,) | 1 <1 <n}) =o(G,) = o(G2) 


Hence, 
Mean(G’) < VeG < Mean(G') + o(G2). ¥ 
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As = Ayu = Aes = ran 
= ro a AN = 
-20 ~=-30 


-30 -40 0 -4 
Mean(H23) = —42.5 Mean(H2) =-5 Mean(H25) = —25 
o( H23) = 7.5 o( Hx.) =3 o( F235) =5 
5V, Hes = —40 VY, Ho =—-5 5V, Hos = —25 
5Vp Hos = —45 °Vp Ho =—-—5 5Vp Hos = —25 


Figure 5.7: Defining Hes, Ho4, and Hos 


5.1.4 Thermostatic Strategy 


Berlekamp, Conway, and Guy [2] also improved on the mean strategy. In 
particular, if G= G,+...+ Gn, then for all t, 


*V, (G1) + 'Ve(Go) +:+>+'VR(G,) < 'V,(G) 
< 'V,(Gi)+-:-+'Vi (Gr) +¢ 


This is a generalization of Hanner’s and Milnor’s results. If t = 0, then the 
result is identical to Milnor’s. If t = o(G) then the result is identical to 
Hanner’s. 

For example, consider the three games, H23, H24, and Hos, shown in 
Figure 5.7. By Hanner’s result (Theorem 5.1): 


—72.5 < Vi (Hes + Hos + Hos) < —65 
However, if t = 5 then Berlekamp, Conway, and Guy prove that: 
—70 < V, (Hos + Ho + Hs) < —65 


Berlekamp, Conway, and Guy(2] provide an efficient algorithm for find- 
ing the ¢ that produces the tightest bounds based on thermographs and they 
give an argument of the correctness of the bounds. What follows is a proof 
of their results consistent with the notation developed here. 
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Theorem 5.3 Let G=G,+...+G,. For all t, Left has a strategy such 
that he can guarantee: 


a: ‘Ve (G) > ‘Ve (G1) i *Ve(Gn) —t 
B: *Vi(G) > ‘Vi (G1) + ‘Va(G2) +--+ + Ve(Gn) 


Proof: First, assume that part 8 is true. To show that part a follows, 
assume that Right moves in G,. By induction: 


‘Va(G) = ‘Vz (*RO- (Gi) + G2 +++: + Ga) 
> "Vi ('Rr(G1)) + 'Va(G2) + --- + 'Vr(Gn) 


By the dual of equation 2.3, and since the value of a game can only increase 
if Right plays non-optimal moves: 


‘Va(G1) — t = ‘Vz ("Ri (G1) < Vi('RP (Gi) 
Replacing ‘Vz (‘R?~ (G1)) with 'Ve(G1) — t in Equation 5.4 yields: 
‘Va(G) > ('Va(G:) — t) + 'Va(G2) +--+ + 'Va(Gn) 


Now assume that part a is true. Left has a strategy that guarantees 3. 
In particular, Left makes an optimal move in some game, lets say G, such 
that o(G,) < t. By induction: 


Vi(G) = ‘Vr(LY (Gi) + Gp+--++G,) 
> 'Wr('LY (G1) + Ve(G2) +--+ + Ve(Ga) —t 


(5.4) 


(5.5) 


Since o(G,) < t, by equation 2.3: 
‘Vi (G1) = ‘Va("Lr (Gi) —t 
So replacing 'Vz(G) for ‘Ve(*L{ (G1)) —t in Equation 5.5 yields: 
'Vi(G) > 'VL(G1) + ‘Va(Gn) + +--+ 'Ve(Gn) 


If there are no component game G for which o(G,) > t then the above 
argument does not hold. However, in that case 8 by Equation 2.6 reduces 
to: 

mean(Gi +-+++G,) = mean(G,) + +--+ mean(G,,) 


The linearity of the mean value function was proved by Hanner|7]. If 
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150 850 


130 30 10 -90 
Mean(H2) = 45 Mean(H27) = 20 


o(H26) = 55 o(H27) = 60 
°V, Hoe = 50 °V, Hor = 30 
°VerHoe = 45 Vp Her = 10 


Figure 5.8: Defining Hog and H27 


5.1.5 Comparing Results 


The thermostatic strategy and the second highest o result both improve 
upon Hanner’s result. Often they produce the same bounds. For example, 
they both produce the same bounds for H23 + Hog + Hos where H23, Ho, 
and Hos are defined in Figure 5.7. 

However, there exist sums of games for which each senate’ is superior. 
For example, consider the games shown in Figure 5.8. The second largest 
o method can only bound the value of the game to within 55. The ther- 
mostatic strategy produces the best bounds when t = 0, in which case: 


Vi Hee + °VeHer < Vi(Hoe + Her) < Vp Hoe + °ViHor +t 
50+ 10 < V,(Ho+He7) < 50+30+0 
60 < Vi(H2e+H27) < 80 


The accuracy of the bounds is 20. 
On the other hand, cases exist when the second highest o produces 
better bounds. For example, consider the games shown in Figure 5.9 ?. 


7A slightly more complex example that has exactly the same behavior is 


{ 25, { 50 | 0} | —75}+ {20 | —20} 
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m= /\ ae A, 


25 -75 20 -20 


Mean( F238) = —25 Mean(H29) =0 


o( Hes) = 50 o( Hz) = 20 
Vi Hos = 25 °V,, Ae9 = 20 
Vp Hee = —25 Ve Heo = —20 


Figure 5.9: Defining H23 and Hog 


The thermostatic strategy states that the best bounds are obtained when 
t = 0. In particular: 


°Vi Hos + °VrH2g < Vi(Hos + Ho) < °ViH2s +°Vr Ho +t 
25 + —20 < V,(Hes +H) < 25+20+0 
5 < Vi(Hes+ Ho) < 45 


So, the accuracy of the bounds is 40. However, using the second largest a 
result, it is possible to bound the value of the game to within 20. 


5.2 Search 


Searching for the optimal solution to an instance of SUM requires expo- 
nential time. However, good pruning techniques can greatly cut down the 
time required to perform a search. In particular, the approximate solutions 
presented in this chapter provide a great deal of power in terms of pruning 
and directing the search for an optimal solution. 

One simple example occurs in an alpha-beta pruning search. The effec- 
tiveness of the alpha-beta pruning techniques is highly dependent on the 
order that the nodes are searched [12]. The approximate algorithms can be 
used to place an ordering on the nodes. One useful heuristic, for example, is 
to first explore the search tree below the node that has the highest possible 
pay off. 
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H39 = Hz, = Hg. = 


0 10 
2 1 0 -10 20 0 -4 -8 
Mean(Hgo) = 1 Mean(Hs1) = 7.5 Mean(Hs2) = 0 
o( H30) =1 o(Hs1) = 7.5 o(Hs2) =6 


Figure 5.10: Defining Hgo9, H3,, and Hs: 


However, the alpha-beta pruning technique does not fully use the in- 
formation provided by the approximate solutions. For example, assume its 
Left’s move in the sum of games shown in Figure 5.10. Applying Theorem 
5.1 ° to each of Left’s options, yields: 


9 Vi (L1H30 + H3; + Hs2) < 16.5 
11 << «VV, (As0 + L1Hs1+ Hs2) < 17 
18.5 < V;(Hso+ He +LiHs:) < 28.5. 


AIA 


It becomes immediately apparent that playing in Hs: is optimal. However, 
an alpha-beta search would expand out the search tree in order to prove 
that H32 is optimal. 

Theorem 5.1, does not guarantee that it will uniquely distinguish one 
move as superior to all others. However, it does give valuable information 
about the relative merit of each move. One effective use of the information 
supplied by the approximate algorithms occurs in the B* algorithm [4]. 
What follows is a description of the algorithm. 


5.2.1 The B* Algorithm 


Each node in the search tree generated by B* search is associated a range of 
values such that the actual value of the node is guaranteed to be within that 


3Though Theorem 5.2 and and Theorem 5.3 produce better bounds, the bounds produced 


by Theorem 5.1 are easier for the author to produce and for the reader to verify. Theorem 
5.1 will be used exclusively through out this section. 
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“15 
es A» a 20 * AS 


40 10 31 21 4 -2 
Mean(Hss) = 2.5 Mean(Hs4) = 3 Mean(Hss5) = —2 
o(Hss) = 22.5 o( Ha) = 18 o(Hss) = 


Figure 5.11: Defining H33, H34, and H3s5 


range. This information is used both to direct the search, and to terminate 
the search promptly. 

At any point in the search, the range of values associated with a node 
has been determined via an evaluation function like Theorem 5.1, or have 
been derived by backing up information obtained through exploration of 
the search tree below the given node. For example, assume that the search 
begins with Left to move in the sum of games defined in Figure 5.11. By 
Theorem 5.1, 3 < V,(H3s3 + Hz4 + Hg) < 26. However, after exploring the 
search tree as shown in Figure 5.12, it becomes apparent that, at worst, 
Left can move to { 40 | 10}+ Hs, + Hs and get 8. Hence, the bounds can 
be improved to be: 8 < Vi (Hss + 34 + Hs35) < 26. 

The basic B* algorithm consists of expanding a leaf of the search tree, 
using the evaluation function to compute the range of that node, and then 
backing up the new information. Assuming the search is looking for the 
optimal move for Left, the search is terminated when the lower bounds on 
one of Left’s options is greater than or equal to the upper bound of all the 
other options. 

The key question in B* search is which node on the frontier to expand. 
What distinguishes B* from other search strategies, like best-first, is that 
B* employs two different strategies. The first strategy, called Disprove- Rest, 
deepens the search tree under the second best node trying to decrease its 
upper bound. The other strategy, called Prove-Best, deepens the search 
tree under the “best” candidate node trying to increase that nodes lower 
bound. 
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H33 + Ha + Hos 
3<V,<2> 
8< Vz, < 26 


{40|10}+ Haq + Hess 
8 < Vp < 26 


H33 + { {31 | 21} | 20} + Hs 
—1< Vr < 21.5 


H33 + Hza + {4| 2} 
AG <= Ve<65 


Figure 5.12: Expanding the Search Tree One Level 


The power of the B* search comes from its ability to use both strategies. 
In game situations, it is often easier to prove that a move is bad, then to 
prove that a move is good. For example, Figure 5.13 shows the search 
tree in Figure 5.12 expanded one level using the Disprove-Rest strategy. 
The expansion lowers the runner-up’s upper bound from 21.5 to 2. This 
is enough to terminate the search, since the best nodes lower bound (8) 
is now greater than the other nodes upper bounds (2, 6.5). However, if 
the tree in Figure 5.12 is explored using Prove-Best, then the search does 
not terminated as quickly. Figure 5.14 shows the best-node expanded one 
level. The expansion produces tighter bounds for the best node, but does 
not produce a lower bound greater than 21.5. 

Berliner [4] suggested two rules for choosing between Prove-Best and 
Disprove-Rest. The first favored exploration of subtree which had not been 
explored deeply. The second favored the exploration of nodes with large 
ranges. Palay [19] obtained better results by assuming that the actual 
value of a node is uniformly distributed over the range of the node, and 
then the strategy with the highest probability of success was chosen. Palay 
[20] improved upon his previous work to allow for a arbitrary probability 
distribution, as oppose to assuming uniform distributions. 
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Hs3 + Has + Hos 
3<V,<2> 
8< Vz, < 26 


{40 |10}+ Hs4 + Hos Hsg + { { 31 | 21} | 20} + Ass H33 + Hsa + { 4 | 2} 
8 < Vr < 26 —-1< Vp < 21.5 => —16 < Vp < 6.5 
—1< Vr <2 


~20 + { {31 | 21} | 20} + Hs H33 + {31 | 21} + Hess H33 + { { 31] 21} | 20} -5 
~1<V,<2 26.5 < V, < 40 18.5<V, < 41 


Figure 5.13: The Search Tree Expanded using Disprove-Rest 
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{40 | 10}+ Hs, + Hes 
8<Vp<26> 
8 < Vp < 20 


10 + Hgq + Has 
11 < V; < 20 


H33 + H3q + H35 
3<V,<2=> 
8 < Vz, < 26 


Hs3 + { {31 | 21} | 20} + Hss 
—1< Vp < 21.5 


Hy33 + Hsg + {4 | 2} 
—16 < Vp < 6.5 


{40 |10}+ Hs4—5 
23 < Vz < 41 


{40 | 10}+—15 + Hes 
8< Vz, <23 


Figure 5.14: The Search Tree Expanded using Prove Best 


71 


Chapter 6 


Endgame of Go 


The endgame of Go is naturally characterized as a sum of games. During 
a game of Go, the players place black and white stones down on a square 
grid trying to surround territory for themselves and to invade the poten- 
tial territory of their opponent. In the beginning and middle game, it is 
very common for a move to have both a local and global effect. However, 
towards the end of the game, the board becomes divided up into a number 
of separate regions and each move has only a local effect. 

For example, Figure 6.1 shows a simplistic Go position that has reached 
the final stages of the endgame. Black has taken a large center territory 
while white has carved out four separate corner territories. The rest of the 
game revolves around a number of small, distinct, border disputes. 

The endgame of Go is an important example of a sum of games because 
of the wide spread interest in the game. Go is played professionally in the 
Orient at the level that chess is played in the West. It is common for a 
game between professionals to be very close (within one point) and hence 
the endgame has been studied extensively, e.g., [18]. 

Furthermore, Go has attracted a reasonable amount of attention from 
the A.I. community. Unlike Chess, Go does not simply succumb to brute 
force search. For that reason, Berliner [3] said that “even if a full-width 
search program were to become World Chess Champion, such an approach 
cannot possibly work for Go, and this game may have to replace chess as the 
task par excellence for A.I.” Interesting work has been done on constructing 
Go playing programs, e.g., [1], [13], [23], and [24]. 

Though the endgame of Go is naturally described as a sum of games, 
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Figure 6.1: An end position in a game of Go 


Assuming tts Black’s turn to move, optimal play ts the following: 
Black plays at a, White responds at b, Black plays at ¢ taking 
two white stones, and White responds at © capturing two black 
stones. At which point, the position has repeated ttself. Under 
Japanese rules the game ts considered a draw. 


Figure 6.2: Choses: 


issues arise when the theory of a sum of games is applied to the endgame 
of Go. This chapter address those issues. 

This chapter assumes that the reader is familiar with the basic rules of 
Go. A good introduction to the game can be found in How to Play Go by 
Takagawa([28]. 


6.1 Ko 


Repetitive situations requiring special rules often occur in board games. 
For example, in Chess, if a board position is repeated three times, then 
the rules state that the game is a draw. Similarly, using the Japanese Go 
rules, some repetitive situations result in a drawn game. Such situations 
are rare, but do exist. For example, Figure 6.2 shows one such situation 
called chdsei meaning eternal life[29]. 

One instance of a potentially repetitive situation that often occurs in Go 
is called ko. The two typical configurations of a ko are shown in Figure 6.3. 
Without a special rule, both positions could result in an infinite capture - 
recapture sequence. The ko rule prevents an infinite repetitive sequence by 
stating that if one player takes a ko, then his opponent can not retake the 
ko on the next turn. 
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If Black plays at a, then by the ko rule, White can not play at 
@ on the next turn 


Figure 6.3: Ko 


Consider a ko position as in Figure 6.3. When it is important to the 
players whether Black ends up with a stone at a or whether White ends 
up with a stone at © a ko fight ensues. The basic form of a ko fight is 
the following: Black takes the ko. Unable to immediately retake due to 
the ko rule, White plays a ko threat, t.e., a threatening move elsewhere on 
the board. Black, unwilling to watch White carry through on the threat, 
responds to the ko threat. White then takes the ko. Black is now in the 
position White was at the start of the fight. Black wants to take the ko 
but is unable to because of the ko rule. So, Black makes a ko threat. The 
ko fight continues until one player ignores his opponent’s ko threat and fills 
the ko. 

Ko fights are an important part of the game of Go. They occur fre- 
quently in the endgame. Unfortunately, the nature of kos and of a ko fight 
is very different than the games considered thus far. 

The ko rule, along with other special case rules, prevent repetitive board 
positions. However, when viewed in a local situation, a ko is a repetitive 
situation. In particular, assume that the endgame of Go is represented as 
a sum of small border disputes and assume that one of the border disputes 
involves a ko. During the ko fight, the border position containing the ko 
will repeatedly oscillate between two positions, t.e., Black has the ko, or 
White has the ko. 

This cyclic nature, prevents ko from being represented as a finite Con- 
way game. One could imagine representing ko as an infinite tree or as a 
graph. However, such a representation is outside of the theory as currently 
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known. 

The notion of mean value and temperature of a game is crucial to the 
heuristic solutions developed in section 5.1. However, these notions do not 
make sense in relationship to ko. The computation of mean value and 
temperature, via thermographs, is dependent upon the representation of a 
game as a finite tree. 


6.2 Complexity Results 


Lichtenstein and Sipser[15] proved that Go is PSPACE-hard. At first 
glance it would appear that one could prove that the endgame of Go is also 
PSPACE-hard by using theorem 4.2 and constructing a reduction from 
SU Maz to the endgame of GO !. However, two difficulties arise with this 
approach. 

The first difficulty is in constructing a reduction from SU Mac: to the 
endgame of GO. It is easy to construct a Go position that has the structure 
of a switch or a left heavy tree, e.g., Figure 6.4. However, to transform 
more complex games into Go positions is hard. It is not at all clear that 
the games used in the reduction from QBF to SUMg<2 correspond to Go 
positions. 

The basic problem is that a Go position has a great deal of structure 
that is not part of an arbitrary game. For example, in a Go position almost 
all the places Black can play White can play. It is hard to imagine a Go 
position where Black has 100 valid moves but White only has one. Hence, 
in any game corresponding to a Go position, Left and Right will have 
approximately equal number of options. 

Similarly, the persistence of moves in a Go position greatly restricts the 
types of games that can be represented on a Go board. Generally speaking, 
if a move is initially available for Black, then the move remains available for 
Black until one players places a stone at that point. This type of structure 
is uncommon. 

The games used in the reduction from QBF to SU Mg<2 in Theorem 4.2 
are switches, left heavy trees, and the game shown in Figure 6.5. Switches 


1The comments made in this section also apply to Morris’s [17] proof that SUM is 
PS PAC E-complete 
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Assume that Left ts playing white, and that the white group ts 
alive. Then, this position represents the tree: 


Figure 6.4: Representing a Left Heavy Tree 


Figure 6.5: A Key Game in the PS PAC E-complete Proof 


17 


and left heavy trees are easy to represent on a go board. However, there 
is strong reason to believe that the game shown in Figure 6.5 can not be 
represented on a Go board. 

The game in Figure 6.5 seems to violates Go’s persistence of moves. 
Right has two moves. Right can make a very large threat or Right can 
take a quick profit. What’s unusual, however, is that if Right takes a quick 
profit, then he can no longer make the threat. That is, the move that takes 
the quick profit must also stabilize, invalidate, or in some way remove the 
other move. Similarly, if Right takes the quick profit, he can no longer 
make the threat. Constructing such a Go position seems unlikely. 

Even if one had a reduction from SUMy<2 to the endgame of GO, a 
problem remains. If the standard representation of a GO position is used, 
then the chain of reductions from QBF to SUMa<2 as in Theorem 4.2 to 
the endgame of GO does not correspond to polynomial time reduction. The 
heart of the problem is that Go positions are unary representations, and 
the reduction from QBF to SUMa<:2 uses position numeric notation. 

In particular, the standard representation of a Go position asanxn 
square grid implies that the size of a Go position is proportional to its 
value. For example, on a square grid, a position worth 30 corresponds to 
position corresponds to either 30 open grid points, 15 grid points contain- 
ing 15 prisoners, or some other combination of open points and prisoners. 
Furthermore, though the reductions in Theorem 4.2 create an instance of 
SU Mgacz that is only polynomially larger than the initial input, the value of 
the numbers in the constructed game are exponentially larger than the size 
of the original input. Hence, if the reduction from QBF to the endgame of 
Go used Theorem 4.2, then the resulting Go position would be exponential 
larger than size of the original input. 

Morris/17] conjectures that this problem can be solved through the use 
of a compact notation for fractional values. He comments that Go positions 
“can exhibit almost any form of composition from dyadic rationals.” How- 
ever, though fractional values are used to estimate Go positions, fractional 
values never actually appear in Go. The leaf values of any game tree cor- 
responding to a Go position are always integral. Hence, fractional notation 
will not solve the problem. 

The problem might be solved if a compact representation of Go posi- 
tions was allowed. For example, one could imagine a representation where 
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the territory surrounded by a group was represented by a binary number, 
instead of by the corresponding number of empty grid points. 


6.3. Approximation and Search Algorithms 


Ignoring ko, the approximate solutions and search algorithms discussed in 
this thesis are applicable to the endgame of Go. One could imagine a Go 
playing program that represented each of the border disputes as a tree, 
and then determined its next move by Hanner’s heuristics, or by B* search 
combined with Hanner’s approximations. 

The biggest problem facing such a Go playing program, is simply ex- 
tracting the game trees from the Go board. The success of the extraction 
depends upon the programs ability to recognize the decomposition of the 
board into a number of separate border disputes, determine the life and 
death of groups, and determine the connectivity of loosely related stones. 
All these tasks are non-trivial. 
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Chapter 7 


Future Research 


This thesis analyzes play in a sum of games from three different perspec- 
tives: computational complexity, heuristic solutions, and optimal search 
algorithms. This chapter summarizes the results and considers directions 
for future research. 


7.1 Complexity 


Determining the optimal strategy in a sum of games seems to be hard. 
Lockwood Morris[17] proved that SUM is PS PACE-complete. This thesis 
proves that even when the component games in the sum are very small, t.e., 
maximum depth of two, the problem remains PS PAC E-complete. How- 
ever, there may be interesting classes of sums of games that, even assum- 
ing P # PSPACE, can be played optimally in polynomial time. Future 
research is needed to determine exactly when the problem is PSPAC'E- 
complete. 

For example, both Morris’s PS PAC E-completeness proof and the proof 
of Theorem 4.2 rely on a game that has multiple options, t.e., the proofs 
rely on a game in which Right is given a choice between two moves. The 
question arises: does the difficulty of playing a sum of games depend upon 
having multiple options. My conjecture, after spending a large amount of 
time unsuccessfully trying to determine optimal play in a sum of left heavy 
trees, is that SUM is PS PAC E-complete even when the component games 
have maximum depth two and have no multiple options. 
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Figure 7.1: A Sum of All Small Games 


An interesting class of games are the all small[2] games. A game is all 
small if all the leaf values are zero, and both players have legal moves from 
every non-terminal position. Thus, no subtree of the game represents a 
number. For example, Figure 7.1 shows a sum of all small games. 

Determining the optimal strategy in a sum of all small games seems 
hard. Some knowledge about how to play all small games can be obtained 
through Berlekamp, Conway, and Guy’s theory of atomic weights|2]. How- 
ever, I conjecture that the problem of determining the winning strategy in 
sum of all small games is also PS PAC E-complete. 

The possibility exists that SUM could be solved using a pseudo-polyno- 
mial time algorithm. The question is whether there exists a natural set of 
restrictions that can be applied to instances of SUM to yield a polynomial 
algorithm. For example, does limiting the size of the values of the terminal 
positions allow for a polynomial time algorithm? Unfortunately, given the 
difficulty of playing all small games, there is a real possibility that SUM is 
hard in a strong sense. Search for a pseudo-polynomial time algorithm and 
a NP-hardness proof in a strong sense should proceed in parallel. 
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7.2 Heuristic Solutions 


Hanner|7] proved that the final score of a sum of games can be approximated 
to within the maximum temperature of a component game. This result 
is quite remarkable. It bounds the value of a sum of games in a way 
that is independent of the number of games in the sum and independent 
of the complexity of the component. Though this thesis presents a clear 
proof, a simple, short, intuitive argument for why the final score can be 
approximated to within the largest temperature is still needed. 

Central to Hanner’s proof is the concept of taxation. An important 
property of taxation is that it is linear. Let G = G,+...+ G, and let each 
component game G; be approximated by the taxed version of that game. 
Then one way to view Hanner’s proof is that G is approximated as the sum 
of the approximated component games. 

In the search for better heuristics, a natural way to proceed is to con- 
sider other linear approximation functions besides taxation. However, it 
is surprisingly hard to find linear functions. The mean function and any 
multiple of the mean function is linear. Also, in {2], a linear function, called 
overheating, is defined. However, the simpler heating function is not linear. 

All the functions mentioned above, t.e., taxation, mean, heating, over- 
heating, have the form 


Mae Bee 


for some value of c, some predicate P, and some function N. It would 
be useful to known the constraints on P and N such that the above form 
produces a linear function. 

Perhaps even more amazing than Hanner’s bounds is Hanner’s mean 
strategy. Hanner proved if a player uses the mean strategy in 


G=G,+...+ Gn, 


then he can force the final score to be within the Maz(o(G;)) of the optimal 
score. This thesis proved that, by using the mean strategy, a player can 
force the final score to be within the second largest o(G;). However, by 
using the mean strategy, a player often does even better than that. How 
good is the mean strategy? 


82 


7.3 Search 


Searching for the optimal solution to an instance of SUM requires expo- 
nential time. However, the approximate solutions presented in this thesis 
provide a great deal of power in terms of pruning and directing the search 
for an optimal solution. In particular, the thesis suggests that the approx- 
imate solutions combined with Berliner’s B* search algorithm will reduce 
the time required to find the optimal solution. 

One direction for future research is to quantitatively measure how effi- 
cient B* is. On average, what improvement does one gain using a B* instead 
of alpha-beta search algorithm? Is there a simple way to characterize those 
games that are solved efficiently using B*? 

The power of the B* search comes from its use of both a prove best and 
a disprove rest strategy. In the context of solving an instance of SUM, are 
there any good heuristics for choosing one strategy over another? 
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